NIAID has made a significant informatics investment for basic and applied research resources in allergy, immune-mediated, and infectious diseases. This wealth of digital assets provides a valuable and critical resource to enable data-driven research in the scientific community.
Provided are general guidelines to prepare and establish consistent data management and sharing plans across NIAID-funded research. They are consistent with current NIH data-sharing guidelines and genomic data sharing policy and underline the expectation that both human and nonhuman data be released in a timeline that is consistent with NIAID’s mandate to support basic and clinical research as well as to respond to public health emergencies. These guidelines are also consistent with contemporary principles, such as findable, accessible, interoperable, and reproducible (FAIR) standards for data release.
Rapid and unrestricted sharing of data and research resources is essential for advancing research on human health and infectious diseases. The utility of data and resources to the scientific community is largely dependent on how quickly these data are deposited into public repositories and made discoverable for reuse by others. NIAID is committed to rapid release of experimental data including genomic and other large-scale data types and, in addition, recognizes that clinical data and other metadata associated with the genomic, omics, and other data are valuable research resources. For these reasons, NIAID endorses rapid release of all these data sets and anticipates that generated data will be made freely available through NIH-approved repositories.
In turn, users of any shared data are expected to act responsibly to recognize the scientific contribution of the data generators/producers by following fair use of unpublished data and normal standards of scientific etiquette. See Sharing Data From Large-scale Biological Research Projects: A System of Tripartite Responsibility.
Data Management and Sharing Plans
Projects designated by NIAID for rapid data sharing for public access should develop their data management and sharing plans based on these guidelines. Investigators are encouraged to discuss their plans for data and resource sharing with NIAID program officers. Plans will be reviewed and approved by NIAID. For projects generating large-scale genomic data, the data management and sharing plan should also address compliance with the NIH Genomic Data Sharing (GDS) Policy. Importantly, this NIAID guidance makes more explicit and ensures rapid data release timelines especially pertaining to nonhuman genomic data that supersede those stipulated in the NIH GDS policy.
Specific Guidelines for Data Types
Sequence Data Including Genome, Transcriptome, Microbiome, Epigenome, Metagenomics
All raw genome or metagenome data generated using sequencing approaches should be submitted as rapidly as possible and no later than 45 calendar days after quality control to the Sequence Read Archive or, as appropriate, to dbGAP at the National Center for Biotechnology Information (NCBI)/National Library of Medicine/NIH. These data should also include information on sequencing platforms, libraries, quality values for each sequence, primers, templates, vectors, quality values for each sequence, and other relevant metadata as appropriate. This includes the broad application of next-generation sequencing, including for example RNAseq, ChIPseq, TnSeq, SNP profiling, among many others.
Full or partial genome and metagenome assemblies and their annotations should be submitted to GenBank either as individual samples or for defined cohorts of samples as rapidly as possible and no later than 45 calendar days after being generated and validated. These may then be released to other discoverable repositories, as approved by NIAID.
GenBank records for genome assemblies and annotation should contain language to acknowledge the funding source in addition to the data generators.
Clinical Data and Other Metadata
NIAID expects that relevant clinical and associated metadata or any other type of data such as antibiotic resistance that are essential for the biological interpretation of genome sequence data and other omics and experimental data sets will be submitted and made publicly available through the appropriate NIAID-approved repository/knowledgebase at the same time as the experimental data. Plans to do so must be included in the data management and release plan and approved prior to the initiation of data generation. The plan will include 1) a list of metadata to be released, 2) the repository/knowledgebase(s) where they will be shared, and 3) timelines to do so.
The rights and privacy of human subjects who participate in clinical research studies shall be protected at all times. Any data that may potentially identify human subjects should be carefully reviewed and excluded prior to sharing through open access repositories and knowledgebases. Eighteen data elements defined by the Health Insurance Portability and Accountability Act of 1996 (HIPAA) safe harbor standard must be considered in this review. In some cases, potentially identifying data may be deposited in a controlled access database as designated by NIAID, such as dbGAP.
All NIAID-funded studies involving human subjects should explicitly seek consent for future research use of samples and broad sharing of participant data. Participants who do not consent to future use or broad data sharing may still participate in the primary study, if consistent with study design. Whenever possible, studies should seek broad consent for general research use of the samples and consent should not limit the types of users who may access the data.
Single nucleotide polymorphisms (SNP) for human genomic data should be submitted as rapidly as possible to NCBI dbSNP and no later than 45 days from completion of standard quality control practices. Non-identifying clinical and other metadata should follow the release guidelines above.
Genome Wide Association Studies Data (GWAS)
Data generated from human genomic or human genome-wide association studies should be submitted as rapidly as possible to NIH dbGAP following the NIH Genomic Data Sharing Policy. The data should be deposited into dbGAP within six months of data generation or at the time of publication, whichever comes first. Per NIH policy, the data will be available in this controlled access database for up to one year to investigators who submit a request with a 12-month publication embargo. After one year, investigators must resubmit their request.
Software Source Code
All software developed as part of NIAID-funded projects must be addressed in the data management and sharing plans. All software developed using NIAID funds must be released under the Open Source Initiative-approved, non-viral, open source license. The terms of the software availability should permit others to freely use, modify, and include as components of other software, as well as to commercialize enhancements of the software. Awardees are encouraged to manage and disseminate their source code through an open revision control and source code management system such as GitHub.
Other data types not specifically addressed above, including expression data, immunological data, proteomic data, other omics data, unpublished primary and secondary data, and models and other digital work products are expected to be rapidly deposited into publicly accessible repositories, as included in the data management and sharing plan and approved by NIAID. These data are expected to be released within nine months of generation and validation or upon publication, whichever comes first.
Data resulting from processing and analysis (e.g., metagenomic relative abundances) should be made available to the public within nine months of generation and validation or upon acceptance of a manuscript for publication, whichever comes first. This includes data analysis performed without data generation or limited data generation by the center or research program. The data management and release plan should include public accessibility of the analyzed data and repository (e.g., bioRxiv.org).
Requests for embargo of data are not encouraged and must be justified in the data management and sharing plan submitted to NIAID, requiring prior NIAID approval.
Requests for embargo of omics data are not encouraged and require prior approval.
Clinical data and metadata may be embargoed upon request and approval at NIAID-approved repositories such as the NCBI dbGAP for up to nine months or upon publication, whichever comes first and as agreed upon by NIAID.
Specific Guidelines for Reagents
Investigators are encouraged to consult with NIAID program officers to determine which unique reagents, such as microbial strains or clones, should be deposited at the BEI Resources Repository or other approved public repositories. Resources and reagents to be shared should be released rapidly and no later than the time of publication to promote the principles expressed above. Details on sharing should be documented in the resource sharing plan.
For cohorts of strains that are sequenced by NIAID-funded projects, only key representative strains must be deposited at BEI. The collaborator providing the strains to the NIAID-funded projects will contact and submit deposition forms to BEI prior to sequencing of the strain. The following points should be carefully considered prior to depositing a strain into BEI:
- Is the strain or a representative strain available and accessible in other public repositories?
- Are there strains that represent key lineages that can be selected and deposited?
The strategy and criteria for selecting strains for depositing into BEI must be outlined in the project plan. However, the investigators should also ensure other ways to share additional strains with the community, if needed.