Rapid and unrestricted sharing of data and resources is essential for advancing research on human health and infectious diseases. The utility of data and resources to the scientific community is largely dependent on how quickly these data are deposited into public databases, and whether the data are easy to find, accessible and can be re-used by others. NIAID expects rapid release of large-scale genomic data sets and anticipates that data generated will be made freely available via deposition into publicly accessible and searchable international databases, such as the National Center for Biotechnology Information (NCBI)’s GenBank and database of Genotypes and Phenotypes (dbGaP), NIAID-funded databases such as the Bioinformatics Resource Centers (BRCs), or other databases designated and approved by NIAID.
In addition, NIAID encourages sharing of other data types generated with NIAID funding, such as other omics or immunological data.
Beginning in January 2023, applicants will provide genomic data sharing information as part of the new Data Management and Sharing Plan, rather than submitting a separate Genomic Data Sharing Plan.
NIAID also encourages using standard formats and vocabularies or ontologies to describe data elements, such as sequence data, variants, and phenotypic characterization. Find examples of data standards in NIAID’s Human Pathogen and Vector Sequencing Metadata Standards resource. These standards include the following:
- Data should be cleaned (e.g., the analytical dataset is finalized) before sending to data sharing repositories.
- Data pertinent to interpreting genomic data should also be shared. This includes associated phenotype data (e.g., clinical information), exposure data, and descriptive information (e.g., protocol or methodologies used). Metadata around the experiment or study and annotations that are necessary to reproduce any published table or analysis must be included with genomic data submissions.
- Specimen acquisition, experimental procedures, and data processing and analysis methods (e.g., alignment algorithms, software versions) are required with data submission.
- When applicable, common identifiers should be used, such as Uniform Medical Language Systems (UMLS) or an ontology term from an existing ontology.
- Wherever possible, use existing common data elements. For clinical specimens, the data elements that would be included in reporting to ClinicalTrials.gov are recommended.