Skip Navigation
Leading research to understand, treat, and prevent infectious, immunologic, and allergic diseases
Skip Content Marketing
  • Share this:
  • submit to facebook
  • Tweet it
  • submit to reddit
  • submit to StumbleUpon
  • submit to Google +

Bioinformatics Summit 2006

Meeting Report

March, 2006

A Bioinformatics Summit, organized by the Division of Allergy, Immunology and Transplantation (DAIT), was held on February 16-17, 2006 at the Bethesda North Marriott Hotel & Conference Center. There were 94 participants in attendance the meeting on Feb. 16 and 41 people participated in group discussions on Feb. 17. The participants were from many NIH institutes including the National Institute of Allergy and Infectious Diseases (NIAID), the National Cancer Institute (NCI), the National Institute on Aging (NIA), the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), the National Institute of General Medical Sciences (NIGMS), the National Institute on Deafness and Other Communication Disorders (NIDCD), the National Library of Medicine (NLM), the National Institute on Alcohol Abuse and Alcoholism (NIAAA), the Center for Information Technology (CIT), and other Federal agencies such as the National Institute of Standards and Technology (NIST) and the National Science Foundation (NSF). Additionally, non governmental organizations such as Oracle, Northrop Grumman, SAIC, SRA, Georgetown University and DAIT funded grantees and contractors attended.

On the first day of the meeting, Dr. Daniel Rotrosen, Division Director of DAIT, started the meeting by discussing how bioinformatics fits into the broader mission of the Division and NIAID. The Associate Director of the Office of Biomedical Informatics, Cheryl Kraft, gave an overview of bioinformatics challenges in supporting clinical and basic immunology research. She discussed the NIH roadmap project for unifying clinical research, gave an example of different kinds of DAIT funded research, and highlighted the Bioinformatics Integration Support Contract (BISC) funded by DAIT to warehouse both clinical and basic research data generated by the DAIT funded research community. The meeting was organized into two sessions, one to highlight DAIT-funded bioinformatics activities and one to educate the participants about other NIH funded bioinformatics projects.

In the first session of the summit, 10 speakers involved with projects and programs funded by DAIT described their projects and the bioinformatics challenges they face.

Dr. Mark Musen from the Immune Tolerance Network (ITN) introduced his team’s approach to integrating clinical data from clinical trials with laboratory data from mechanistic studies. He proposed an ontology-based application that will treat concepts used in both clinical and basic research as the building blocks of a software application or a workflow pipeline. Using this kind of application, one can build case report forms used in clinical trials by pooling existing concepts and/or defining new concepts in the application. Additionally, project specific forms can be used to guide data elements in downstream data collection and these data elements will become the basis for data analysis. Dr. Musen described an application that may contain many modules such as an ontology browser, a case report form builder, a trial protocol builder, and a data format builder. Investigators can open the case report form builder and select data elements from the ontology browser to build forms for a clinical trial. Similar exercises can be done to define the process and data elements of data collection and data processing. Since the same ontology can be used for different clinical trials and different data collection processes, the data generated from those clinical trials and mechanistic studies can be easily integrated based on common concepts or ontology. If enhancement of the existing concepts is necessary, only the ontology needs to be changed, and all other changes in the data processing pipeline will be automatically implemented. When evaluating existing ontologies, Dr. Musen indicated that the existing ontologies for health care and life sciences such as SNOMED, MedDRA, and CDISC are used either by FDA to enforce government regulations or to manage patient care. They are lacking concepts to support basic life science research. Therefore, there is a definite need to expand existing ontologies to facilitate research oriented clinical and basic studies.

Dr. Tiepu Liu and Dr. David Iklé introduced activities carried out by PPD in supporting DAIT funded clinical studies, including the ITN data coordinating center, the statistical and clinical coordinating center for the Cooperative Clinical Trials in Pediatric Transplantation (CCTPT) and the Clinical Trials in Organ Transplantation (CTOT) consortiums. At PPD, Oracle Clinical and Oracle RDC have been used as the database platforms to support bioinformatics activities such as clinical data collection, specimen tracking and data storage. Their system is enforced by controlled vocabularies such as CDISC, and MedRA. Their data models can generate object-oriented electronic Case Report Forms (eCRFs). Their system can also support data analyses utilizing the programs from Oracle clinical, SAS, SPlus, and JMP, etc. Dr. David Iklé emphasized that they made a special effort to share common data elements in protocol design with ITN in clinical trial protocols of both the CCTPT and the CTOT projects to leverage what has been developed and avoid redundant effort.

Dr. Bjoern Peters from the Immune Epitope Database (IEDB) project gave an overview of the IEDB. He introduced the ontology developed by the IEDB team, ontology experts and by guided manual literature curation. He also demonstrated a query interface driven by ontology and controlled vocabularies. When discussing analysis tools, Dr. Peters introduced a web-based analysis platform that is used in the IEDB. The results of user queries can be easily streamlined in an analysis tool without data format changes or the need for copying and pasting. He emphasized the importance of developing a domain specific ontology in the early phase of the project. He also emphasized the importance of performing inter-field data validation and developing effective data exchange mechanisms such as XML for data submission to the IEDB, which ensures data quality in the IEDB.

Dr. Herman Mitchell from Rho expressed his concern with the effort to define a unified process for all clinical studies. He believes that the variations among different clinical trials with different clinical phenotypes for different study objectives are the driving force to advance science and medicine. He suggested that we should pay special attention to preserving the ability to have clinical variations when unifying common data elements. He was concerned that a common structure would not allow for the variability that is inherent in clinical research. Dr. Dennis Wallace from Rho introduced their data management system to support the Autoimmunity Centers of Excellence (ACE). He emphasized the importance of keeping the raw data collection separate from the processed data for analysis. He also emphasized the efforts of multi-level data validation from the time of data collection to the point of data analysis.

Dr. Daniel R. Salomon, Principal Investigator (PI) of the Genomics for Kidney Transplantation Program funded by DAIT, presented the bioinformatics challenges within his program. Dr. Salomon emphasized the uncertainties in research at the cutting edge of science. Large scale and high throughput research platforms are used in his program, such as gene chips to detect gene expression profiles, SNP chips for genome typing, tandem mass spectrometry to study proteomics of transplantation. as well as integrated analysis to leverage the results generated from different research platforms. There are technological challenges in research ranging from how to compare results across experimental platforms to what quality assurance is necessary to evaluate the quality of research results, to how to organize the data for analysis. There are also challenges in clinical practices, such as the lack of standards in sample collection procedures, sample preparation protocols at each clinic, sample storage conditions, RNA, DNA and protein isolation at each lab, sample amplification, and probe labeling. Dr. Salomon emphasized the need to build a bioinformatics infrastructure to support new types of research in the genomic era.

Dr. Richard H. Scheuermann and Dr. David Karp, PI and co-PI of the BISC project, introduced the Immport system. Dr. Scheuermann focused his presentation on methods for capturing experimental data generated from various research platforms. He presented challenges in supporting current immunological research, such as genetics studies, genomic analyses, large-scale gene expression analyses, cellular studies of immune cells, and in proteomics studies. He emphasized the need for data models to capture the metadata of experiments that are as detailed as possible and as flexible as possible. He also emphasized the need for a controlled vocabulary in data capture and ontology driven knowledge inferences to make the archived data more useful for long-term use by the research community, the ultimate goal of NIH data sharing policy. Dr. David Karp emphasized that new types of bioinformatics activities such as the centralized data archive created in the BISC project, have required changes to the IRB process and patient consent. He raised the issue that lack of an NIH-wide guidance to protect patient privacy when sharing the data generated by genome wide genotyping studies is a problem.

Dr. Donald Stablein, the president of EMMES Corporation, presented their newly developed web-based system to manage clinical trials. He said that EMMES decided to build this new system in 1999 since the old system was not able to handle the rapidly growing requests to support new clinical trials. The new system has a web form builder developed in 2000, an offline data entry and forms caching capability built in 2003, protocol monitoring and automatic MedDRA processing modules deployed in 2004, and query resolution and file attachment modules launched in 2005. Currently, their new system manages over 40 projects, more than 250 protocols, and coordinates clinical studies carried out at over 1000 sites on all continents. Their experience clearly shows the effectiveness of a well-built bioinformatics support system in managing clinical studies.

In the second session of the summit, invited speakers from non-DAIT funded projects gave the audience a broader view of what bioinformatics support can do to help biomedical research. Dr. Kevin Becker from NIA introduced their low budget, but on demand database, which stores genetic association study results abstracted from published literature. Mr. Eric Miller, coordinator of semantic web technology in health care and life sciences from, emphasized that in order to build a flexible system, we need to focus on building modules with common interfaces that can glue the modules together. Dr. Jeffrey Grethe, the scientific coordinator of the Biomedical Informatics Research Network (BIRN) project funded by National Center for Research Resources (NCRR), introduced their experience in managing the storage of large brain imaging files at different sites and the integrated analyses of image files generated by different investigators. These files also contain meta-data and use a unique imaging calibration approach.

Mr. Joe Mychaleckyj from the Type 1 Diabetes Genetics Consortium (T1DGC) introduced what they learned in supporting a large worldwide genetics study of Type I diabetes. Many synergies exist between the T1DGC and the BISC projects. Several data types used in the T1DGC project such as the HLA typing data, SNP genotyping data, and genetic analysis tools are very similar to the data types managed by the BISC project. The analysis tools used by the T1DGC are also very similar to the tools needed by the Population Genetics and HLA Region Genetics projects supported by the BISC program. The T1DGC was willing to share their data model to manage disease phenotype with the BISC team, such as managing the demographic and physiological traits of patients, tools for genomic analysis, genetic database ontology, and their experiences in balancing patient privacy and research data dissemination.

Dr. Peter Covitz from NCI introduced interoperability in the cancer biomedical informatics grid (caBIG) project. He indicated that the key challenge in interoperability encountered by the cancer research community is in the area of data interpretation consistency when data are accessed and analyzed at a central repository and at individual research site. To deal with these two key areas, NCI compiled a series of caBIG compatibility guidelines and funded various efforts to define programming and messaging interfaces in an attempt to enforce syntactic interoperability. At the same time, NCI put together ontologies, common data elements and information models to facilitate semantic interoperability. NCI hopes that these efforts will lay a foundation for federated data integration in the cancer research community.

On the second day, two separate groups were created by random assignment comprised of people from the audience and invited speakers. These groups were given the task of discussing a set of questions developed by the NIAID Program Officers aimed at resolving the issues raised in the previous day’s presentations. One group discussion was facilitated by Dr. Mark Musen from Stanford and the other group discussion was facilitated by Dr. Richard Scheuermann from University of the Texas Southwestern. After one and a half hours of discussion, the two groups came together, exchanged their comments, and gave the following comments and recommendations.

Summary of the Discussion

Participants agreed that the foundation for data integration and data sharing is data standards. To establish a widely accepted set of data standards, we need to identify common needs within the research community and take an approach that will allow the community to define and agree upon the best set of standards for immunology research rather than NIAID dictating the standards that should be used. In order to formulate a user-friendly set of data standards in a rapidly changing biomedical research field, we need to build a dynamic, flexible and sustainable infrastructure to manage data standards and integrate data standards with working databases and applications at each individual site. It is essential that NIAID take the lead in organizing groups to work toward data standards and provide adequate funding to individual research sites to support this effort.

Discussion Details

What are the main problems/issues that were described yesterday and how you would begin to develop solutions to these problems?

Lack of interoperability of data is the main problem. It implies the need for standards.

Is a unified common data element resource such as the resource in CaBIG necessary for the immunology community?

There is a need for flexibility but there is also a need for core elements. Standards should be developed locally that can be mapped back to broader standards. We should start simple and focus on a shared vocabulary, not necessarily complex ontologies.

  • The inventory of what common areas exist in the institutes should be the starting point of forming the standards and a bottom-up approach should be the driving force of such activity. The top-down approach may lead to impractical standards and user community resistance.
  • The tools to access standards and models need to be developed simultaneously including the tools to map local applications to common data elements, the tools to manage the development or versioning of the common data elements, and the tools to visualize the common data elements.
  • The content generation of these common data elements should be a trans-NIH effort.
  • Flexibility is needed in the common data elements to meet the requirements of specific domains.
  • There should be a feedback communication mechanism to keep the user community in the loop for developing and enforcing the common data elements.

Are there existing models that can facilitate the integration of basic and clinical data?

  • Integration of data is NOT the problem from technical perspective. 
  • Interoperability is really the issue.
  • Common vocabularies and analysis tools are needed.
  • A working group is needed to build better tools to use vocabularies.
  • Flexibility in standards needs to be maintained.

Are there specific challenges within the clinical or basic science communities to the adoption of specific standards?

  • Education is needed focused on NIAID selected standards and guidelines to encourage the best practice.
  • In case of legacy systems, the challenge is how to retrofit the new systems to existing systems.
  • There are specific challenges to adopt the Gene Ontology (GO), such as the difficulties in mapping phenotypes to the GO and lack of evidence codes or ranking information in the GO.
  • Lack of ontology-enabled services that integrate many good ontologies exist already.

Are there working groups that could be formed to work on specific issues that are common to bioinformatics efforts? What would be the structure and goals (short-term and long-term) for such groups? Is this the most effective approach for developing data standards or obtaining community buy-in?

  • Yes, working groups are necessary.
  • Bring ontologists, biologists/immunologists, and clininicians together to identify common data elements in each domain of immunology. Have the scientific community define the problems.
  • Common vocabularies for interoperability need to be defined.
  • Tools such as semantic web technology need to be considered.
  • Proper use cases from the community need to be investigated and well defined.
  • DAIT/NIAID and the BISC team need to take the lead in organizing the activities of the working groups.
  • The working groups should have disease experts, scientists from the field and bioinformatics experts.
  • It was suggested to form a high level, long term committee and short term, task driven ad hoc subgroups.

What can NIH/NIAID/DAIT and/or contractors and grantees do to encourage and maintain long term collaborative and interactive relationships among different programs?

The NIAID/NIH may accomplish this through its funding resources, leadership, better coordination among institutes, and incorporation of the general immunology community.

  • Educational workshops.
  • Support the local investigator by supplementing existing grants that have bioinformatics resources to work with large existing systems.
  • Ensure better access to meta-data, such as exposing the minimally processed data with descriptions of the data and origin of the data.
  • Design and define how to share and interoperate in newly released program announcements and RFAs.
  • Host some wrappers and tools to facilitate data exchange and integration at NIAID funded sites such as the BISC web site.
  • Creation of a structure with BISC in the lead to support the local sites complying with the interoperability guideline.
  • NIAID needs to provide the leadership in this area.
  • Put together different disease experts and try to find areas of commonality and difference.
  • Gather common forms used in DAIT funded clinical research and post them where researchers can access them.
  • Design an effective framework to capture metadata for data archive.
  • NIH wide patient consent language is necessary for IRBs to anticipate future deposition of data in large repositories where it maybe used in ways not originally intended.
  • Steering committees should be formed for each of the networks and coordinating centers across NIAID to coordinate data standards and consistent collection of data.. Stakeholders from each of the grant portfolios should also be included in this process.
  • Interoperability needs to be enforced by encouraging the use of common data elements and analysis tools.

Action Items

  • DAIT program officers will work on organizing an interoperability committee comprised of DAIT grantees and contractors.
  • DAIT program officers will collect requirements from the user community on the common research areas and emerging data standards
  • DAIT program officers and the interoperability committee will decide what working groups are necessary.

back to top

Last Updated July 17, 2006