A New Era For Arrays


by Mike May

When it comes to gene expression, scientists invariably want more: they want to follow more genes, do more replicates, and study more organisms. And advances in microarray technology more accurate probes, more probes on arrays, and better data analysis keeps giving scientists more of what they want.

Figure 1. Caliper’s sipper chips, such as the LC90 F sipper chip shown here, come with 1, 4, or 12 sippers to handle varying throughput requirements. (Image courtesy of Caliper Life Sciences.)
One of the more exciting recent advances involves the ability to screen entire genomes with a single microarray. Jessica Tonani, genotyping specialist at Santa Clara, CA-based Affymetrix, says of genome-wide screening, “The first time a technology lets you take an unbiased look at the entire genome.” She adds, “You can use these screens as an effective approach to discovering genes related to disease.” As a recent example, the June 7, 2007 issue of Nature published a report by the Wellcome Trust Case Control Consortium, which used the Affymetrix GeneChip technology to look for genes related to seven human diseases. This whole-genome study produced more than 10 billion individual genotypes and found 24 genes related to bipolar disorder, coronary artery disease, diabetes, and other diseases.

Spreading over SNPs
The latest microarray from Affymetrix, the Genome-Wide Human SNP Array 6.0, includes 1.8 million markers for genetic variation. About half of the markers are SNPs, and the rest interrogate copy-number variation.

Making this microarray required several advances. The Affymetrix team developed a unique approach to tiling the probes. Two years ago, the feature size on Affymetrix microarrays was 18 microns, but it’s only 5 microns on the latest array. In addition, says Tonani, “Analytic improvements allow us to decrease the number of probes used to interrogate each SNP.”

This microarray offers several applications. For one thing, says Tonani, “This microarray can find causative markers that might not be identified in older whole-genome studies, which have historically been underpowered due to high costs of technology and lower genetic coverage.” This microarray can also be used in replication studies and to examine copy number, which seems related to many conditions, such as bipolar disorder, cancer, and HIV resistance.

Bring on the beads
Illumina, Inc. (San Diego, CA) also aims some of it microarrays, such as the recent Human 1M DNA Analysis BeadChip, at copy-number variation. This microarray includes over one million SNPs. It targets known copy-number variation sites and new ones developed in collaboration with deCODE genetics (Reykjavik, Iceland). Illumina and deCODE also collaborated on the HumanCNV370-Duo, which was designed to target novel sites with copy-number variation.

Illumina packs probes on microarrays by using bead technology. Shawn Baker, gene expression scientific product manager at Illumina, says they’ve also reduced the cost of using microarrays, down to about $125 per sample. Moreover, Illumina microarrays can process 6-12 samples per chip. “(Taken) together, reducing the cost of individual samples and processing multiple samples per chip, enables larger experiments,” says Baker.

These advances appear in several recent Illumina microarrays, including its updated arrays for human and mouse gene expression. Illumina also now makes a rat array. All of these microarrays use Illumina’s combination of hybridization and enzymatic technologies. “This leads to more specific and sensitive results,” says Baker, “and it requires less sample.”

Improving the input
To get the best results from microarrays, the input and output must be considered. Targets that go on a microarray represent one input. To facilitate this process, Caliper Life Sciences (Hopkinton, MA) automated the target-generation protocol. Mark Roskey, vice president, reagents and applied biology at Caliper Life Sciences, says, “We wanted customers to push a button and make a labeled target.” Moreover, that target must be made just right, every time.

Adding automation increases consistency. “It takes out variability between experiments,” says Roskey. “Larger microarrays require extremely consistent targets or the data comes out with reduced accuracy.”

“Input” also encompasses sample material that goes in the target-making equipment. For that, Caliper made a microfluidics chip that provides quality control for the RNA used to make the targets. To get a microarray that generates consistent, accurate results, it’s worth assaying that input RNA, which can degrade fairly easily.

Improving analysis
Figure 2. Performing principle components analysis (PCA) on whole-genome, SNP-association data, JMP Genomics 3.0 can visually identify population structure in samples from unrelated individuals. Here, unrelated HapMap samples from three different populations cluster in well-defined groups using PCA. (Image courtesy of the SAS Institute.)
As microarrays grow more dense, analysis methods must keep pace. “In whole-genome chips,” says Doug Robinson, JMP Genomics applications scientist manager at the SAS Institute (Cary, NC), “the size of the data sets gets to be a challenge.” He points out that one project could include one million SNPs and require 4,000 people for test and control subjects. That pumps out four-billion data points. Working with gigantic data sets, though, makes up one of SAS’s longtime strengths.

When it comes to genome-wide scanning, SAS takes on data with JMP Genomics 3.0, which was recently launched. It can examine SNPs through linear models, such as analysis of variance. It can also take on continuous or binary traits, and even test for interactions. Moreover, the latest version adds principle components analysis, which Robinson describes as “taking a data set, examining the variance, and then using that to look at how different samples might cluster together.” This can be used to find population structure in apparently unrelated individuals. This is necessary for understanding potential sources of noise in the data.

Other companies also take on the data challenge. The amount of information is so great, says Andrew Ferrin, vice president of sales and marketing at Golden Helix (Bozeman, MT), that in order to collaborate or simply share data with their colleagues, some scientists carry hard drives full of microarray data. That drove Golden Helix to develop a proprietary data compression technology that, Ferrin says, “enables several whole-genome data sets to fit on a single USB ThumbDrive and allows whole-genome analysis to be performed interactively with conventional hardware.”

It’s one thing to get the data in place, but another to analyze it. Ferrin says that the historic and ongoing exponential increases in microarray density mean exponential growth in computational requirements. At first, Golden Helix programmers attacked these problems by developing ever-faster algorithms. When that wasn’t enough, their scientists made their software run on a computer grid, so that a team of processors can work together. For example, Golden Helix PBAT software, for family-based SNP analysis, is grid enabled, and HelixTree genetic analysis software, will be grid-enabled by the end of this year.

Illumina’s BeadStudio also provides microarray analysis. Such analysis can push ahead even farther with Illumina.Connect, which encourages software companies to work together by making data from one company’s microarray analysis work in programs from other companies.

Combining data
Figure 3. This Two-Loci genetic plot shows the statistical significance of performing associations of pairs of genetic markers with a response variable. (Image courtesy of Golden Helix.) Click to enlarge.
Advancing genome-wide capability also involves the interaction of hardware and software. Both keep getting better and easier to use. “We now have the hardware infrastructure to move through data and tools that take the analysis out of the hands of high-end computational experts and give it to bench biologists,” says Eric Schadt, senior scientific director at Rosetta Inpharmatics, which is a Seattle-based, wholly-owned subsidiary of Merck & Co.

Software must often combine various types of data, such as results from gene-expression and genotyping experiments. “This gives a more complete picture of diseases and drug responses,” says Schadt. Nonetheless, such combination analysis requires serious computing power. The Rosetta scientists use a cluster of 7,200 central processing units, but Schadt says that it’s running at full capacity 24 hours a day, seven days a week. “We need to double what we have,” he says.

Schadt and his colleagues use this computing power and their software expertise for in-house projects and collaborations. For example, Rosetta and deCODE Genetics, recruited 1,000 patients to study obesity. The scientists collected blood and adipose samples and scored more than 50 clinical traits, such as body mass index and glucose levels. They also isolated DNA and RNA from every sample and then integrated the resulting genotype, gene-expression, and clinical information to reconstruct networks behind obesity. “This defines, at a molecular level,” says Schadt, “the networks that drive obesity in the Icelandic population. With these predictive networks, we can identify the best points for therapeutic intervention.” He adds, “This turned up a number of completely novel targets that you never would have imagined.” Merck uses molecular profiling to identify suitable biomarkers for efficacy before a new drug enters the long and expensive development process.

Such work also provides a new perspective on disease. Although the literature includes many publications on disease-related genes, Schadt says, “There is no single driver. Common diseases, such as obesity, diabetes, heart disease, are emergent properties of the networks. Genetic and environmental changes twist that network, and that leads to disease.” Ultimately, understanding those networks will produce more-targeted therapies.

Opening avenues
Although microarrays provide many advances in exploring genomes, some areas, such as discovering rare transcripts and splice variants, might be handled more completely with sequencing technology, according to Baker of Illumina. This company’s Digital Gene expression applications, which use Solexa sequencing technology, will come out later this year. Baker says, “This will give a cleaner picture of what gets transcribed, and it can be less expensive and cumbersome.” Overall, such new tools will provide scientists other ways to explore genes and genomes.




 


© 2006 Advantage Business Media All rights reserved.
Use of this website is subject to its terms of use.
Privacy Policy