The NIAID Office of Data Science and Emerging Technologies (ODSET) highlights publications that feature innovative uses of data science and bioinformatics in infectious, immune-mediated and allergic disease research.
Explore NIAID data science publications on PubMed:
- NIAID-funded publications that use data science or computational biology (since 2023).
- Publications funded or co-funded by ODSET.
- Publications from the Harnessing Big Data to Halt HIV initiative.
If you would like to feature a publication on this page, please contact Data Science. Publications should feature research related to infectious, immunologic, and allergic diseases; include data science or a related discipline; and cite NIAID funding in the manuscript. Please include in your email:
- The title of your published article.
- A link to the article.
- A 50-60 word description of the article.
156 Results
Multi-Omics Analysis of Human Blood Cells Reveals Unique Features of Age-Associated Type 2 CD8 Memory T Cells
February 1, 2026 Aging Cell
Using a multi-omics approach, the authors identify age-related transcriptional and epigenetic changes in CD8 T cells and show age-related epigenetic changes are associated with health conditions such as asthma and type 2 diabetes.
Multicohort assessment of plasma metabolic signatures of tuberculosis disease in children: a retrospective cross-sectional study
January 23, 2026 Scientific Reports
To supplement existing diagnostic tests with suboptimal accuracy or difficult to collect samples, the authors analyzed blood plasma metabolomics in children with or without tuberculosis (TB). They present a nine-metabolite biomarker signature that shows moderate accuracy in identifying TB in children.
Genomic risk prediction of type 2 diabetes in people living with and without HIV
January 22, 2026 Scientific Reports
Using data from NIH and NIAID-funded studies, the authors measured the accuracy of polygenic risk scores models in predicting type 2 diabetes among groups with different ancestry and HIV status. They find that models incorporating multiple traits outperformed those considering a single trait, and that model performance was similar among those with and without HIV.
PURE-seq integrates FACS and PIP-seq for single-cell genomics of ultra-rare cells
January 21, 2026 Nature Communications
Pan and colleagues describe a novel sequencing method, PURE-seq, which enables enrichment of and transcriptomic characterization of very rare cells at a single cell level. They use this workflow to produce single-cell gene expression profiles of circulating tumor cells collected from patient blood.
The ratio of circulatory levels of sphingolipids to steroids predicts asthma exacerbations
January 19, 2026 Nature Communications
Using metabolomics and electronic medical records from three large asthma cohorts, the authors develop a predictive model of future asthma exacerbations. They show that ratios of sphingolipids to steroids in the blood more accurately predicts future asthma exacerbations than current clinical measures.
Multi-omics analysis of a pig-to-human decedent kidney xenotransplant
January 16, 2026 Nature
Despite efforts to improve molecular compatibility, organ transplants from other species (pig) can still trigger immune reactions resulting in transplant failure. Here the authors use multi-omics profiling characterize the immune response to xenotransplantation and identify potential targets for improving transplant success.
CAMP: a modular metagenomics analysis system for integrated multistep data exploration
January 16, 2026 NAR Genomics Bioinformatics
The authors present CAMP, Core Analysis Modular Pipeline, a modular workflow for performing metagenomic analyses. The pipeline consists of modular components enabling flexibility and analysis of intermediate files, along with semi-automated visualization of results.
TCR2HLA: Calibrated inference of HLA genotypes from TCR repertoires enables identification of immunologically relevant metaclonotypes
January 16, 2026 PLoS Computational Biology
The authors present an open-source tool, TCR2HLA, that infers human leukocyte antigen (HLA) genotype from T cell receptor (TCR) sequences. The authors use TCR2HLA to identify TCRs associated with inferred HLA genotype and SARS-CoV-2 exposure status.
SEA CDM: Study-Experiment-Assay Common Data Model and Databases for Cross-Domain Data Integration and Analysis
January 14, 2026 Scientific Data
To foster sharing and integration of various biomedical data types across experiments, the authors developed an ontology-supported Study-Experiment-Assay (SEA) common data model (CDM). They further present the Ontology-based SEA Network (OSEAN) relational database and knowledge graph and show how large number of studies from various sources can be represented and utilized.
Metabolomic profiling reveals the potential of fatty acids as regulators of exhausted CD8 T cells during chronic viral infection
January 6, 2026 PNAS
This study aimed to characterize the metabolic environment involved in CD8 T cell exhaustion that occurs from chronic infections. The authors find that levels of fatty acids increase early on in chronic infections and that administration of fatty acids late in chronic infections favored stem-like CD8 T cells.
Omics in Nonsteroidal Anti-Inflammatory Drugs-Exacerbated Respiratory Disease: Current Evidence From the Upper and Lower Airways
January 3, 2026 Allergy
This paper offers a review of studies applying omics technologies to nonsteroidal anti-inflammatory drugs (NSAID)-exacerbated respiratory disease (N-ERD) in either the upper or lower respiratory tracks. The authors propose future works utilize multi-omics techniques, experimental standardization, and characterization of both respiratory tracks in the same patients.
EXPLANA: a user-friendly workflow for EXPLoratory ANAlysis and feature selection in cross-sectional and longitudinal microbiome studies
January 2, 2026 Bioinformatics
Fouquier and co-authors have developed a feature selection workflow using machine learning methods to identify meaningful variables associated with specified outcomes from longitudinal microbiome data. The tool, available on Github, supports both categorical and numerical data and generates an interactive report of the results.
The NIAID Discovery Portal: a unified search engine for infectious and immune-mediated disease datasets
December 31, 2025 mSystems
Datasets from infectious and immune-mediated disease (IID) studies are often stored across various repositories each with different metadata schemas and search capabilities. The NIAID Data Ecosystem Discovery Portal aims to provide users with a centralized location to easily find and access IID datasets using intuitive searches and filters.
MTHFR allele and one-carbon metabolic profile predict severity of COVID-19
December 23, 2025 PNAS
Using samples from the IMmmunoPhenotyping Assessments in a COVID-19 Cohort (IMPACC) study, the authors found changes in one-carbon metabolism were predictive of disease severity. Further, the authors show that genetic status of a key gene involved in methionine synthesis and early alterations in one-carbon metabolism together were predictive of both disease severity and risk of developing long COVID.
Quantifying viral pandemic potential from experimental transmission studies
December 17, 2025 PLoS Computational Biology
Current methods of estimating pandemic risk from viruses identified in animals are limited, due in part to the high cost and low resolution of animal transmission experiments. Somsen, et al. developed a model to assess transmission and epidemiological components of pandemic risk based on viral titer data from infected animals.
A resource to empirically establish drug exposure records directly from untargeted metabolomics data
December 9, 2025 Nature Communications
To aid untargeted metabolomic studies, which can enable direct assessment of drug exposure from samples, the authors developed the Global Natural Product Social Molecular Networking (GNPS) Drug Library. This resource contains tandem mass spectrometry reference spectra for drugs and corresponding metabolites along with standardized metadata about the drugs, including therapeutic use and mechanism of action.
Assessing AI’s cognitive abilities for scientific discovery in the field of systems vaccinology
December 5, 2025 Science Immunology
Using immunological case studies, the authors assessed the ability of five large language models (LLMs) to accurately synthesize biological literature, formulate hypotheses, propose experiments to test the hypotheses, and provide broader significance to results. While the LLMs could accurately collect and synthesize existing information, they struggled to develop novel hypotheses and experiments.
Timely vaccine strain selection and genomic surveillance improve evolutionary forecast accuracy of seasonal influenza A/H3N2
December 4, 2025 Elife
Improvements in vaccine development time and the lag time from sample collection to sequencing results observed surrounding the SARS-CoV-2 pandemic may bring similar improvements to influenza vaccine development timelines. Here, the authors show realistic decreases in forecasting time produce more accurate predictions of future viral sequences and shorter sequencing turnarounds produce more accurate estimates of current clade frequencies.
HLAtools, Searching Shared HLA Amino Acid Residue Prevalence, and the Global Frequency Browsers: New Computational Resources for Working With HLA Data and Visualizing Global Patterns of HLA Variation
December 3, 2025 International Journal of Immunogenetics
Genomic diversity at the HLA region is known to play an important role in human disease, with over 41,000 known alleles found across the world. The authors here present novel open-source tools for querying, analyzing, visualizing HLA variant distributions across populations.
Peanut allergy oral immunotherapy drives single-cell multi-omic changes in peanut-reactive T cells associated with sustained unresponsiveness
December 3, 2025 Nature Immunology
To better understand how oral immunotherapy can establish continued unresponsiveness to peanut allergens, researchers analyzed single-cell multi-omics data from the POISED clinical trial cohort. They identified numerous changes among T cells that correlated with a continued lack of sensitivity to peanut allergens after stopping oral immunotherapy.
DeepRNA-Reg: a deep-learning based approach for comparative analysis of CLIP experiments
December 3, 2025 RNA Biology
To support analysis of crosslinking immunoprecipitation (CLIP) data, the authors designed an algorithm that uses deep learning to predict differentially enriched binding sites between datasets. The authors showed their algorithm to produce more accurate predictions across a variety of settings, including in microRNA regulation of T-Helper 2 cells.
HIV Pharmacology Data Repository: Setting the New Information-Sharing Standard for Clinical and Preclinical Pharmacokinetic Studies
December 3, 2025 Clinical Pharmacology and Therapeutics
The authors propose minimal information standards for pharmakinetic data to better enable data sharing and reuse. Using these standards, the authors integrated data from existing studies into a new public database, the HIV Pharmacology Data Repository.
STREAMS guidelines: standards for technical reporting in environmental and host-associated microbiome studies
December 1, 2025 Nature Microbiology
Synthesizing input from over 200 researchers, the authors provide detailed guidelines for researchers reporting on environmental and non-human host-associated microbiome studies. The guidelines aim to promote FAIR data principles and will be maintained and updated.
Virus taxonomy: the database of the International Committee on Taxonomy of Viruses
November 26, 2025 Nucleic Acids Research
The International Committee on Taxonomy of Viruses (ICTV) develops and maintains viral taxonomy as well as a public database for data access and analysis. This report describes the recent improvements made to the ICTV resources, including new tools for taxonomic analysis and visualization.
Inferring asymptomatic carriers of antimicrobial-resistant organisms in hospitals using genomic, microbiological and patient mobility data
November 19, 2025 Nature Communications
Asymptomatic carriers of antimicrobial-resistant organisms can spread these pathogens within the healthcare system, but due to their lack of symptoms have been hard to identify or predict. Here, researchers integrated multiple data types, including genomics and patient behavior, into a model that is better able to predict asymptomatic carriers.