Array QC Project Advances Technology


Getting Control Of Microarrays
Alan Dove

In a widely-cited 2003 study, scientists at the NIH's National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) performed the same experiment on three separate microarray platforms. The three platforms yielded wildly divergent results.(1)

Microarrays have been suffering from a crisis of confidence over concerns that their results are not reproducible between labs.
Now, by exploiting the results from a huge quality control test and employing a few new tricks, researchers may finally be able to trust their gene expression data.
"It was simply a way of evaluating different microarray platforms to see which one would be most suitable for our institute," says Maggie Cam, director of the Genomics Core Laboratory at NIDDK and senior author on the paper. Nonetheless, the paper hit a nerve, and microarray companies, basic researchers, and the FDA soon scrambled to figure out whether microarrays could be trusted.

Probing for answers
As more microarray comparisons trickled into the literature, researchers saw reassuring signs of reliability in the technology. In early 2006, for example, scientists at Harvard Medical School found that arrays gave reproducible results, as long as researchers focused on relative expression levels rather than absolute values, and abundant transcripts rather than rare ones.(2) "The bottom half of the transcriptome, let's say under 10 copies per cell, is just not seen by microarrays," says Zoltan Szallasi, Senior Research Scientist at Children's Hospital, Harvard Medical School, and senior author on the paper.

That was good news for the FDA, which has been trying to encourage pharmaceutical companies to submit more microarray data in new drug applications. While the agency has issued a series of "Guidance" documents spelling out its expectations for microarray technology, drug developers have remained wary, concerned that this new type of data might be misinterpreted.(3)

Besides using microarray data to support new drug applications, companies are also eyeing the diagnostics market, where reproducibility will be a day-to-day challenge. Though there are few regulations on microarray-based diagnostics now, many clinical labs are already honing their skills with the technology (see Sidebar).

But until very recently, even microarray experts often found it difficult to know what they were really measuring, as many array manufacturers held their probe sequences as closely-guarded trade secrets. "If you're not using the probes that are directed against the message, then you're going to measure the wrong thing," says Szallasi.

Figure 2. Total RNA isolated from 10 individual human cell lines were reverse-transcribed to cDNA, labeled with Cy5 and co-hybridized with Cy3-labeled UHRR (Universal Human Reference RNA) onto 43,000-spot cDNA microarrays (Stanford University). The data was analyzed using GeneTraffic® software. Approximately 6000-8000 spots out of 43,000 (14-18 %) were flagged on each microarray and excluded from further analysis. Spots with hybridization signals in Cy5 channel higher than 1000 and with Cy5/Cy3 ratio greater than 2 were collected and the number of spots with these characteristics on only one microarray was determined. (Photo courtesy of Stratagene Corp.) Click to enlarge.
When researchers managed to extract some probe sequence information, what they found was eye-opening. "For the various microarrays, a very large percentage of the probes were not directed against the transcripts that the manufacturer claimed they were directed against," says Szallasi. To address the problems, manufacturers, researchers, and the FDA joined forces in an ambitious plan: the Microarray Quality Control (MAQC) project.

High-stakes testing
Led by Leming Shi, a computational chemist at the FDA's National Center for Toxicological Research in Jefferson, AR, the MAQC accomplished an astonishingly complex project in a remarkably short time. The effort mass-produced a set of reference RNA samples from cells, then screened them on several microarray and non-microarray gene expression platforms in multiple labs. With more than 100 scientists involved, the multi-institution effort produced and analyzed over 27 million data points, took just over a year, and cost the public nothing.


A key factor in the MAQC's success was the participation of nearly all of the major microarray manufacturers. Besides finally agreeing to release all of their proprietary probe sequence data, the companies also absorbed the project's substantial equipment and processing costs, and exposed themselves to a risky direct comparison with their competitors.

Indeed, the MAQC was a gamble for the entire microarray field: it could provide either a definitive endorsement or a definitive debunking of the technology. The final results, which appeared in a series of papers in the September 2006 issue of Nature Biotechnology, now provide essential background reading and a detailed shopping guide for all scientists working with arrays. "I think overall it is reassuring to see that they found a large degree of overlap between platforms," says Sorin Draghici, associate professor of computer science at Wayne State University.

Draghici has worked extensively on microarray reproducibility, but was not involved in the MAQC. While he is pleased that the effort prompted companies to release full probe sequences into a public database, he and other outside experts remain wary that beginning microarray users may over-interpret some of the reports' conclusions. Specifically, he cites a pair of graphs using two different analytical techniques, fold change and p-values, and worries that biologists may interpret the data as an endorsement of using fold change, possibly yielding artificially optimistic results.

Figure 3. Pictured is the next generation Agilent Whole Human Genome 4344K microarray hybridized with the same targets used in the MAQC study. This 4-Pack microarray was printed using Agilent’s 60-mer SurePrint Technology with probes representing 41,000+ unique human genes/transcripts on each of the four arrays on the 133 slide. The signal intensity in each of the two color channels represents the differing amounts of gene expression between the two MAQC samples. The second and fourth arrays are the dye-swaps of the first and third arrays on the slide. (Photo courtest Agilent Technologies, Inc.)
Cam, whose work is often cited as a major inspiration for the MAQC, is also guardedly positive about the project's findings. She points to the consortium's development of standard RNA sample controls, which researchers will be able to buy from Ambion and Stratagene, as a useful development. "I think that interlab differences might be reduced if labs were able to reproduce the results of MAQC," says Cam.

Still, the project did not settle all of the reproducibility problems. "There is an inherent level of discordance that's unrelated to [sequence] annotation. I think that there's still a lot to be learned about probe design and cross-hybridization," says Cam.

To reduce variability, experienced microarray users offer several suggestions. Cam's lab has switched from manual array processing to robotic techniques, a change that required some tinkering but eventually yielded more predictable results. Testing equipment and technicians with a standard reference sample, like the new MAQC controls, can also provide a useful anchor. Some commercial services can perform detailed quality control tests (see Sidebar), which may help a lab decide whether to process arrays in-house or send them out to a higher-volume facility.

The MAQC results can also help researchers salvage earlier microarray experiments, using the newly disclosed probe sequences to eliminate artifacts. "People in my group went back to the old datasets, and we can clean it up just throw away everything that's wrong," says Szallasi. The complete probe sequences, and all of the other MAQC data, are freely available online (http://edkb.fda.gov/MAQC/MainStudy/upload/Summary_MAQC_DataSets.pdf ).

Expression profiling with microarrays may never be perfectly reliable, but with the new quality-control tools, good experimental design, and careful data analysis, researchers should be able to ensure that their results look like reproducible science.

Making The Grade
As researchers come to grips with the deluge of information from the MAQC project (see main text), some companies are already offering related quality control services for both basic research and clinical diagnostic labs. Regular quality control tests may be pricey to perform, but they could pay dividends in improved reproducibility.

Typically, quality control or “proficiency” tests involve two or more labs assaying a single set of replicate samples, then comparing the results. While researchers could simply use the MAQC control samples and compare themselves with a collaborator, hiring a professional quality testing service provides a more robust comparison.

Expression Analysis (Durham, NC), for example, sends replicate RNA expression samples to a group of labs, then compares their Affymetrix array results in batches, usually 10 to 15 labs at a time. “We can send out a report to each lab saying ‘here’s your repeatability, how well you get the same results within your lab, and then how that compares to all the other labs that participated,’” says Laura Reid, the company’s director of research and development.

Experienced microarray researchers recommend repeating such tests regularly. “I think that it’s a good way to monitor your own performance over time, as well as to make sure that your performance stays on par with what’s expected,” says Maggie Cam, director of the microarray core facility at the NIH’s National Institute of Diabetes and Digestive and Kidney Diseases.

Proficiency tests are already popular with large basic research labs, and many clinical laboratories are also starting to use them, in order to stay ahead of compliance with the Clinical Laboratory Improvement Amendments (CLIA) law. “Although microarrays are not a test that at this point requires CLIA certification, many labs are trying to live up to the CLIA guidelines,” says Reid.

References
1. Tan, P. K. et al., Nucleic Acids Res. 31(19):5676-84 (2003).
2. Draghici, S. et al., Trends Genet. 22(2):101-9 (2006). (Epub Dec. 27, 2005).
3. Dove, A., Drug Disc. and Dev. 9(6):40-44 (2006).

 


© 2006 Advantage Business Media All rights reserved.
Use of this website is subject to its terms of use.
Privacy Policy