Research Technologies Branch
Genomic Technologies Section
Bioinformatics
Informatics and Statistics Needs for Microarray Experiments
The Data Analysis and Bioinformatics Unit offers full life cycle expertise in statistics and informatics essential for this data-intensive experiment.
The Microarray Experiment Life Cycle
The Data Analysis and Bioinformatics Unit is working to simplify the process of extracting meaningful information and knowledge from array experiments. We do this by providing tools and advice for many of the steps in the life cycle of a data-intensive experiment.
Experiment Design
Investigators are strongly encouraged to consult with the Data Analysis and Bioinformatics Unit before starting a new project to address experimental design issues in order to maximize the statistical power and interpretability of the microarray measurements.
Management of Sample Information
It is extremely important to keep an "electronic notebook" with standardized "machine readable" tables describing the characteristics and processing history of samples used in hybridization. Reagent and batch grouping details may need to be screened for confounding influences on the ultimate hypothesis testing or pattern discovery efforts. Keeping these "metadata" in some electronic format, for example, in MS Excel or a database, will ensure that subsequent analysis steps will not be delayed.
To help make sample description information management more painless, efforts are underway to integrate MIAME-compliant sample information into the mAdb system.
Data Collection
An effective and reliable data processing pipeline has been established in the Microarray Research Facility for collecting and processing spotted (two-dye) microarrays. The current solution utilizes the mAdb system for management of microarray layouts and up-to-date gene annotations. Slide scanners and software are available to users of the facility for producing industry-standard hybridized array images and reduced, image-analyzed tabular data. Tabular output files and compressed images can be loaded directly into the mAdb system for permanent storage.
Data Quality Evalution
Summary statistics and diagnostic charts are available in the mAdb system to assist users in qualifying candidate hybridization trials. Augmenting data quality assessment for microarray data is an ongoing area of research for the Data Analysis and Bioinformatics Unit.
Hypothesis Testing
Optimal experimental design will support statistical testing. Even if the intent of the experiment was to perform exploratory data analysis, the results of statistical testing will provide guideposts for the qualification of interesting observations. A combined evaluation of statistical and biological significance of interesting observations offers the best opportunities for results which can be validated.
The mAdb system offers several tests for group differences. More elaborate testing can be done by exporting microarray projects from the mAdb system into statistical software packages.
Data Mining
Multiple methods are available for unsupervised class prediction and pattern discovery such as cluster analysis within mAdb. The mAdb system also facilitates export of datasets (at various levels of filtering and reduction) into other analysis packages such as GeneSpring (Silcon Genetics), BRB Array tools (NCI), and Partek Pro.
The NIAID Microarray Facility in Frederick has recently released EASE software for Windows. Users can obtain the EASE software from the mAdb System Web site.* Use EASE to determine which GO descriptors are statistically relevant to a gene list of interest.
Statistical Model Building
Larger projects with adequate samples can facilitate an explicit modeling of multiple sources of noise and provide greater statistical power. When possible, users are encouraged to take advantage of software packages that offer more statistically sophisticated treatments of microarray data.
Prediction
Ultimately, the goal of any microarray experiment is to provide new information that can be generalized beyond the context of the immediate experiment. Analysis methods that providing statistical significance as well as biological interpretation are ultimately the most effective.
Validation
Successful experimental design and analysis will incorporate biological validation of the results. Validation of both measurement accuracy and exploratory analysis method should be the goal to achieve whenever possible.
Exploiting NCBI Databases
Investigators can consult the Functional Genomics section for assistance in accessing and using NCBI databases. For example,
- Find homologous genes in another species
- Develop specific search programs linked to specific databases
- Find genes in genomic context
- Identify exons and introns of genes
- Model the structure of your protein to make functional inferences
- Visualize the structure of your molecule
- Locate point mutations in genes
Microarray Analysis Software
Commercial Software Available Through NIH and NIAID
- SAS system
- SAS JMP
- S-Plus
- GeneSpring
- Partek
- GenePix
Noncommercial Software
- mAdb
- MAexplorer
- BRB array tools
- Bioconductor
back to top