A Bioinformatics Summit, organized by the Division of Allergy, Immunology and Transplantation (DAIT), was held on February 16-17, 2006 at the Bethesda North Marriott Hotel & Conference Center. There were 94 participants in attendance the meeting on Feb. 16 and 41 people participated in group discussions on Feb. 17. The participants were from many NIH institutes including the National Institute of Allergy and Infectious Diseases (NIAID), the National Cancer Institute (NCI), the National Institute on Aging (NIA), the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), the National Institute of General Medical Sciences (NIGMS), the National Institute on Deafness and Other Communication Disorders (NIDCD), the National Library of Medicine (NLM), the National Institute on Alcohol Abuse and Alcoholism (NIAAA), the Center for Information Technology (CIT), and other Federal agencies such as the National Institute of Standards and Technology (NIST) and the National Science Foundation (NSF). Additionally, non governmental organizations such as Oracle, Northrop Grumman, SAIC, SRA, Georgetown University and DAIT funded grantees and contractors attended.
On the first day of the meeting, Dr. Daniel Rotrosen, Division Director of DAIT, started the meeting by discussing how bioinformatics fits into the broader mission of the Division and NIAID. The Associate Director of the Office of Biomedical Informatics, Cheryl Kraft, gave an overview of bioinformatics challenges in supporting clinical and basic immunology research. She discussed the NIH roadmap project for unifying clinical research, gave an example of different kinds of DAIT funded research, and highlighted the Bioinformatics Integration Support Contract (BISC) funded by DAIT to warehouse both clinical and basic research data generated by the DAIT funded research community. The meeting was organized into two sessions, one to highlight DAIT-funded bioinformatics activities and one to educate the participants about other NIH funded bioinformatics projects.
In the first session of the summit, 10 speakers involved with projects and programs funded by DAIT described their projects and the bioinformatics challenges they face.
Dr. Mark Musen from the Immune Tolerance Network (ITN) introduced his team’s approach to integrating clinical data from clinical trials with laboratory data from mechanistic studies. He proposed an ontology-based application that will treat concepts used in both clinical and basic research as the building blocks of a software application or a workflow pipeline. Using this kind of application, one can build case report forms used in clinical trials by pooling existing concepts and/or defining new concepts in the application. Additionally, project specific forms can be used to guide data elements in downstream data collection and these data elements will become the basis for data analysis. Dr. Musen described an application that may contain many modules such as an ontology browser, a case report form builder, a trial protocol builder, and a data format builder. Investigators can open the case report form builder and select data elements from the ontology browser to build forms for a clinical trial. Similar exercises can be done to define the process and data elements of data collection and data processing. Since the same ontology can be used for different clinical trials and different data collection processes, the data generated from those clinical trials and mechanistic studies can be easily integrated based on common concepts or ontology. If enhancement of the existing concepts is necessary, only the ontology needs to be changed, and all other changes in the data processing pipeline will be automatically implemented. When evaluating existing ontologies, Dr. Musen indicated that the existing ontologies for health care and life sciences such as SNOMED, MedDRA, and CDISC are used either by FDA to enforce government regulations or to manage patient care. They are lacking concepts to support basic life science research. Therefore, there is a definite need to expand existing ontologies to facilitate research oriented clinical and basic studies.
Dr. Tiepu Liu and Dr. David Iklé introduced activities carried out by PPD in supporting DAIT funded clinical studies, including the ITN data coordinating center, the statistical and clinical coordinating center for the Cooperative Clinical Trials in Pediatric Transplantation (CCTPT) and the Clinical Trials in Organ Transplantation (CTOT) consortiums. At PPD, Oracle Clinical and Oracle RDC have been used as the database platforms to support bioinformatics activities such as clinical data collection, specimen tracking and data storage. Their system is enforced by controlled vocabularies such as CDISC, and MedRA. Their data models can generate object-oriented electronic Case Report Forms (eCRFs). Their system can also support data analyses utilizing the programs from Oracle clinical, SAS, SPlus, and JMP, etc. Dr. David Iklé emphasized that they made a special effort to share common data elements in protocol design with ITN in clinical trial protocols of both the CCTPT and the CTOT projects to leverage what has been developed and avoid redundant effort.
Dr. Bjoern Peters from the Immune Epitope Database (IEDB) project gave an overview of the IEDB. He introduced the ontology developed by the IEDB team, ontology experts and by guided manual literature curation. He also demonstrated a query interface driven by ontology and controlled vocabularies. When discussing analysis tools, Dr. Peters introduced a web-based analysis platform that is used in the IEDB. The results of user queries can be easily streamlined in an analysis tool without data format changes or the need for copying and pasting. He emphasized the importance of developing a domain specific ontology in the early phase of the project. He also emphasized the importance of performing inter-field data validation and developing effective data exchange mechanisms such as XML for data submission to the IEDB, which ensures data quality in the IEDB.
Dr. Herman Mitchell from Rho expressed his concern with the effort to define a unified process for all clinical studies. He believes that the variations among different clinical trials with different clinical phenotypes for different study objectives are the driving force to advance science and medicine. He suggested that we should pay special attention to preserving the ability to have clinical variations when unifying common data elements. He was concerned that a common structure would not allow for the variability that is inherent in clinical research. Dr. Dennis Wallace from Rho introduced their data management system to support the Autoimmunity Centers of Excellence (ACE). He emphasized the importance of keeping the raw data collection separate from the processed data for analysis. He also emphasized the efforts of multi-level data validation from the time of data collection to the point of data analysis.
Dr. Daniel R. Salomon, Principal Investigator (PI) of the Genomics for Kidney Transplantation Program funded by DAIT, presented the bioinformatics challenges within his program. Dr. Salomon emphasized the uncertainties in research at the cutting edge of science. Large scale and high throughput research platforms are used in his program, such as gene chips to detect gene expression profiles, SNP chips for genome typing, tandem mass spectrometry to study proteomics of transplantation. as well as integrated analysis to leverage the results generated from different research platforms. There are technological challenges in research ranging from how to compare results across experimental platforms to what quality assurance is necessary to evaluate the quality of research results, to how to organize the data for analysis. There are also challenges in clinical practices, such as the lack of standards in sample collection procedures, sample preparation protocols at each clinic, sample storage conditions, RNA, DNA and protein isolation at each lab, sample amplification, and probe labeling. Dr. Salomon emphasized the need to build a bioinformatics infrastructure to support new types of research in the genomic era.
Dr. Richard H. Scheuermann and Dr. David Karp, PI and co-PI of the BISC project, introduced the Immport system. Dr. Scheuermann focused his presentation on methods for capturing experimental data generated from various research platforms. He presented challenges in supporting current immunological research, such as genetics studies, genomic analyses, large-scale gene expression analyses, cellular studies of immune cells, and in proteomics studies. He emphasized the need for data models to capture the metadata of experiments that are as detailed as possible and as flexible as possible. He also emphasized the need for a controlled vocabulary in data capture and ontology driven knowledge inferences to make the archived data more useful for long-term use by the research community, the ultimate goal of NIH data sharing policy. Dr. David Karp emphasized that new types of bioinformatics activities such as the centralized data archive created in the BISC project, have required changes to the IRB process and patient consent. He raised the issue that lack of an NIH-wide guidance to protect patient privacy when sharing the data generated by genome wide genotyping studies is a problem.
Dr. Donald Stablein, the president of EMMES Corporation, presented their newly developed web-based system to manage clinical trials. He said that EMMES decided to build this new system in 1999 since the old system was not able to handle the rapidly growing requests to support new clinical trials. The new system has a web form builder developed in 2000, an offline data entry and forms caching capability built in 2003, protocol monitoring and automatic MedDRA processing modules deployed in 2004, and query resolution and file attachment modules launched in 2005. Currently, their new system manages over 40 projects, more than 250 protocols, and coordinates clinical studies carried out at over 1000 sites on all continents. Their experience clearly shows the effectiveness of a well-built bioinformatics support system in managing clinical studies.
In the second session of the summit, invited speakers from non-DAIT funded projects gave the audience a broader view of what bioinformatics support can do to help biomedical research. Dr. Kevin Becker from NIA introduced their low budget, but on demand database, which stores genetic association study results abstracted from published literature. Mr. Eric Miller, coordinator of semantic web technology in health care and life sciences from W3C.org, emphasized that in order to build a flexible system, we need to focus on building modules with common interfaces that can glue the modules together. Dr. Jeffrey Grethe, the scientific coordinator of the Biomedical Informatics Research Network (BIRN) project funded by National Center for Research Resources (NCRR), introduced their experience in managing the storage of large brain imaging files at different sites and the integrated analyses of image files generated by different investigators. These files also contain meta-data and use a unique imaging calibration approach.
Mr. Joe Mychaleckyj from the Type 1 Diabetes Genetics Consortium (T1DGC) introduced what they learned in supporting a large worldwide genetics study of Type I diabetes. Many synergies exist between the T1DGC and the BISC projects. Several data types used in the T1DGC project such as the HLA typing data, SNP genotyping data, and genetic analysis tools are very similar to the data types managed by the BISC project. The analysis tools used by the T1DGC are also very similar to the tools needed by the Population Genetics and HLA Region Genetics projects supported by the BISC program. The T1DGC was willing to share their data model to manage disease phenotype with the BISC team, such as managing the demographic and physiological traits of patients, tools for genomic analysis, genetic database ontology, and their experiences in balancing patient privacy and research data dissemination.
Dr. Peter Covitz from NCI introduced interoperability in the cancer biomedical informatics grid (caBIG) project. He indicated that the key challenge in interoperability encountered by the cancer research community is in the area of data interpretation consistency when data are accessed and analyzed at a central repository and at individual research site. To deal with these two key areas, NCI compiled a series of caBIG compatibility guidelines and funded various efforts to define programming and messaging interfaces in an attempt to enforce syntactic interoperability. At the same time, NCI put together ontologies, common data elements and information models to facilitate semantic interoperability. NCI hopes that these efforts will lay a foundation for federated data integration in the cancer research community.
On the second day, two separate groups were created by random assignment comprised of people from the audience and invited speakers. These groups were given the task of discussing a set of questions developed by the NIAID Program Officers aimed at resolving the issues raised in the previous day’s presentations. One group discussion was facilitated by Dr. Mark Musen from Stanford and the other group discussion was facilitated by Dr. Richard Scheuermann from University of the Texas Southwestern. After one and a half hours of discussion, the two groups came together, exchanged their comments, and gave the following comments and recommendations.
Participants agreed that the foundation for data integration and data sharing is data standards. To establish a widely accepted set of data standards, we need to identify common needs within the research community and take an approach that will allow the community to define and agree upon the best set of standards for immunology research rather than NIAID dictating the standards that should be used. In order to formulate a user-friendly set of data standards in a rapidly changing biomedical research field, we need to build a dynamic, flexible and sustainable infrastructure to manage data standards and integrate data standards with working databases and applications at each individual site. It is essential that NIAID take the lead in organizing groups to work toward data standards and provide adequate funding to individual research sites to support this effort.
Lack of interoperability of data is the main problem. It implies the need for standards.
Is a unified common data element resource such as the resource in CaBIG necessary for the immunology community?
There is a need for flexibility but there is also a need for core elements. Standards should be developed locally that can be mapped back to broader standards. We should start simple and focus on a shared vocabulary, not necessarily complex ontologies.
The NIAID/NIH may accomplish this through its funding resources, leadership, better coordination among institutes, and incorporation of the general immunology community.
back to top
Last Updated July 17, 2006