Reflections on a Year of COVID-19 Data Sharing

NIAID Now | October 28, 2021

Reflections on a Year of COVID-19 Data Sharing

NIAID-supported research can help fuel further discovery when data are shared quickly in discoverable repositories, following community standards for metadata. Data sharing enables more rapid and open scrutiny of research results and outcomes and allows data across studies to be easily combined and analyzed.

In 2020, NIAID encouraged its grantees to rapidly share COVID-19 research results. Across disciplines, the Research Data Alliance (RDA) has published recommendations and guidelines for COVID-19 data sharing. Broad adoption of these guidelines led to an unprecedented volume of data being shared, and we have seen additional practices that augment the immense potential of the data:  

Following community standards when depositing data (PDF) (e.g., a community-defined vocabulary for the metadata) is pivotal to addressing scientific and public health questions and to maximizing the impact of SARS-CoV-2 data. Collaborative efforts across scientific domains are ongoing to define minimal as well as optimal metadata and their automated capture. For example, Public Health Alliance for Genomic Epidemiology (PHA4GE) is working toward a standard for pathogen genomic sequences, the National COVID Cohort Collaborative (N3C) standardizes clinical and electronic medical records data into a re-useable format using an OMOP common data model, and the NCATS OpenData portal shares data and standardized approaches for SARS-CoV-2 assay and animal model data. 

Now, over a year into the COVID-19 pandemic, researchers from around the world have contributed over 2.5 million SARS-CoV-2 genomic sequences, 1,371 SARS-CoV-2 protein structures, 315 reagents to the NIAID BEI Resources catalog, 7.3 billion rows of clinical data in the N3C database, and over 125,000 papers about the novel SARS-CoV-2 virus and the pandemic.  Despite these great advances, more work remains to take full advantage of the troves of research data available. For example, data sharing is still slow for many data types; genomic sequences from U.S. infections are shared on average 28 days after sample collection. Similarly, too much data are released as figures in publications or pre-prints using non-standardized formats or lacking metadata, which requires significant manual curation and harmonization prior to re-use. Some great strides have been achieved by sites like to automate data extraction and harmonization, but these advances are not yet applicable to all data types.

Continued implementation of best practices in data management and sharing will enable even faster public health decisions and accelerate the development of diagnostics, therapeutics, and vaccines in response to emerging health threats.

Contact Information

Contact the NIAID Media Team.