by Angelo DePalma

Proteomics, the successor to genomics, has rapidly assumed a central role in modern biology and medicine. Where the human genome contains about 22,000 coding genes, the functional proteome is comprised of anywhere from fifteen to a hundred times as many proteins. Differences in position and type of post-translational modifications (glycosylations, acetylations, and other non-backbone chemical changes) within an amino acid sequence provide nearly infinite combinatorial variations for each basic structure. Specificity for cells, and widely ranging concentrations within those cells and tissues, complicate proteomic studies even further.

Biologists who believed that the transition from genome to proteome would be straightforward soon discovered that the relationship between gene and protein, and among proteins, was perplexingly complex. "We basically had to start from the beginning again," says Jeannette Webster, Ph.D., Sr. Scientist at Protein One (Bethesda, MD), a company that provides high-purity proteins and purification services.

An ongoing debate pits the merits of cell- or tissue-based analysis vs. in vitro experiments on "clean" systems. Drug developers, in particular, like cells and tissues because they more closely resemble real-world environments. However, cell-based experiments may also obscure a target protein's true surroundings. "Cells are fine if you're working with a well-understood system, and provided your culture contains the right regulatory proteins," says Dr. Webster. Otherwise, researchers do better by nailing down the molecular mechanisms first, and working in more complex systems later.

Broad dynamic range

Analytics, the heart and soul of proteomics, allows the identification, quantification, and spatial resolution of these proteins, often in comparison between two or more states, for example diseased vs. normal or treated vs. untreated. But protein profiling, as this exercise is called, is orders of magnitude more taxing than traditional biochemistry. Samples for proteomic analysis, typically tissues and whole organisms, contain thousands of proteins and other chemicals in addition to the protein(s) under investigation. Instruments strain to fish out target proteins from high-abundance entities like albumin, with concentration dynamic range spanning 1012. As Jasmine Gray, Ph.D., Marketing Director for Protein Discovery at GE Healthcare (Piscataway, NJ) notes, "Researchers are usually interested in the proteins which are in lowest abundance."

2D gel electrophoresis, the workhorse protein profiling method, is slowly losing ground to protein microarrays. Gels work fine for water-soluble, high molecular weight proteins but are time- and labor-intensive and difficult to reproduce. Spots often contain several proteins, and great skill is required to assure uniform measurement across gels. On the plus side, 2D gels can visualize a thousand proteins or more in one experiment and easily interface to mass spectrometers, but the hybrid technique requires advanced information processing.

Microarrays — alternatives to gels

Protein arrays have led to "orders of magnitude" increases in information per experiment, according to Brett Stillman, Ph.D., Manager for Microarray Technologies at Whatman (Sanford, ME). Antibody-based arrays, in particular, are more efficient than running individual ELISA assays since they provide capture of multiple proteins simultaneously. Unfortunately, protein chips are more difficult to fabricate than gene chips, mostly due to proteins' relative instability and the difficulty of attaching proteins to substrates while retaining activity.

While it is possible to cover most or all of the genome with a single array, the sheer number of proteins precludes development of a human whole-proteome chip any time soon. Early microarrays featured a relatively small (by genomics standards) number of monoclonal antibodies. Today, Invitrogen (Carlsbad, CA) sells a human protein microarray that holds five thousand proteins. "There is a lot of interest in these arrays," says Prof. Mike Snyder, director of the Yale Center for Genomics and Proteomics, "and for antibodies in general."

Five thousand proteins doesn't seem like much compared with millions of possible proteins. No problem, says Dr. Snyder. "The first gene arrays only held a few thousand gene probes, but you could get a lot of information from them," he observes. "If you're only interested in kinases, or proteins that react with certain antibody classes, five thousand spots will give you plenty of material to work from." Snyder predicts that as more high-purity proteins and antibodies are generated, protein arrays will continue growing in size and complexity. "It's just a matter of building content into the chip format."

Snyder's company, Protometrix, commercialized the first high-content protein chips in 2004, and subsequently delivered the first whole-proteome chip — for yeast. (Protometrix was eventually purchased by Invitrogen).

MS becomes an everyday technique

Mass spectrometry (MS), interfaced to gels or microarrays, represents the other major proteomics instrument breakthrough. Founded in 2001, Protein Discovery (Knoxville, TN) provides products and services — all centered on high-throughput MS — for molecular research, clinical diagnostics, drug discovery, and pharmaceutical development. The company benefits from its proximity to the University of Tennessee, Oak Ridge National Laboratory, and Vanderbilt University. Through a license agreement with Vanderbilt, the company offers a service that resolves, in time and space, the distribution of proteins of interest within a tissue sample.

Think of this technique as a variation on the microplate, where the substrate material is tissue instead of plastic, and the wells are areas, as small as 30 microns across, defined by a laser beam. Through an advanced MS technique known as MALDI (matrix-assisted laser desorption/ionization), the laser ionizes bits of tissue, and with them their protein constituents, before sweeping them into the MS for analysis. Software allows researchers to create a color-coded map of the distributions and concentrations of proteins of interest. Proteomic mapping also works well with animal models, from tissues through organs and entire animals, says Chuck Witkowski, CEO.

By Summer, 2006, Protein Discovery plans to introduce a sample handling system, for proteomic MS, that prepares tissue samples for high-resolution MS analysis in less than an hour with high reproducibility and ease of use.

With no shortage of protein-related questions, availability of instrumentation has become the bottleneck in many proteomics efforts. Academic researchers have responded by forming consortia that purchase and share high-field MS and nuclear magnetic resonance instruments. One such effort, the Chicago Biomedical Consortium, is headed by Brenda Russell, Ph.D., at the University of Illinois (Chicago). Member universities (Illinois, Northwestern, Chicago) have secured private funds to purchase a Fourier-transform mass spectrometer (FTMS), at a cost of about $1 million. Fueled by high-end information processing, the instrument serves as the fulcrum for interdisciplinary proteomics efforts at the three universities. "It has opened up a whole new world in terms of the proteins we can identify," says Dr. Russell, who describes previous efforts as "reductionist." "In the past we could identify one gene or protein, in one form, which doesn't answer real questions about the biology of life, health, and disease."

The new instrument can identify numerous proteins in multiple forms (e.g. post-translational modifications) in a single experiment. The glue holding everything together is bioinformatics. Russell points out that genomics faced informatics challenges during the heavy sequencing days, but nothing compared with proteomics. Figuring out which gene codes for which protein, and the relationship among proteins in various pathways, necessitates full-time participation by statisticians and informatics experts.

Figure 1. This is a representation of classification of leukemia (red) vs. normal (blue) patients using a non-linear statistical discrimination technique called Support Vector Machines (SVM). It shows the projection of the data sets on a discriminant coordinate plane using linear discriminant analysis. SVM draws a hyperplane (curved boundary) in space (separation between the two background colors) which can successfully separate the healthy from the diseased groups, thereby helping to diagnose new samples that will be projected into this space. The interesting feature in this particular plot is that there is also clear subdivision of the leukemia (red) group, with the right cluster representing a progressed stage of disease.

Interdisciplinary research

Although rooted in classical biochemistry, proteomics has become an interdisciplinary science. Dennis Manos, Ph.D., a physics professor at the College of William and Mary (Williamsburg, VA), is part of a group of physicists, statisticians, computational scientists — and yes, even biologists — working on biomarker identification, pattern recognition, and signal processing. The group, which primarily relies on MS analysis, has been working with Incogen (Williamsburg, VA), the developer of VIBE (Visual Integrated Bioinformatics Environment), a drag-and-drop analysis workflow management environment. "Our goal," says Dr. Manos, "is to be able to take a mass spectrum of a serum sample and classify it for early disease detection, particularly for cancer."

Statistics are "huge" in the systems biology approach of Manos and coworkers. "There are so many ways to do the differentiation and clustering," he observes. "Ultimately that is what will allow us to recognize patterns represented by distribution and concentration of key species." In the old days, biologists measured two or three circulating biochemicals, for example glucose, to make their case. Today, they face single experiments consisting of hundreds or thousand of mass spec measurements, some close to the noise threshold. "To get your hands around these problems, you need to reduce the number of variables to a manageable number," Manos notes, "say ten or twenty. A thousand is simply unmanageable."

On the production floor

Protein analytics derived from proteomics will come in handy over the next few years as biotech companies seek to introduce generic or "follow-on" biopharmaceuticals. Legally speaking, generic protein therapies don't exist: Congress has not authorized them, and the FDA has not issued any guidance for their approval. For several years, U.S. regulators have grappled with the scientific issues, specifically how companies can best demonstrate similarity between an "original" protein and a copy.

Several technical details stand in the way of this understanding. Since biogenerics are produced in cultured cells or microorganisms, their post-translational modifications differ from native human proteins. Moreover any change in the manufacturing process, no matter how small, can affect a protein's purity, homogeneity, and higher-order structure. Absent strict similarity criteria, regulators would probably ask developers to conduct lengthy clinical trials — essentially killing the nascent biogenerics industry.

Techniques currently used for proteomics research are expected to help biogenerics firms compare critical safety, efficacy, purity, and similarity parameters. Protein chips, for example, can profile protein impurities in short order; MS, sedimentation ultracentrifugation, and light scattering identify and quantify aggregates and oligomers (a big safety concern with FDA). Developers may utilize surface plasmon resonance for detecting protein-protein interactions, bioassays to predict protein activity, and a relative newcomer to proteomics, nuclear magnetic resonance, to unravel primary and higher-order protein structures.

At the three-day Follow-on Biologics Workshop, hosted by the New York Academy of Sciences last December, approximately half the speakers outlined how various analytic techniques, most of them recently adopted or improved for detecting low-abundance proteins, will eventually find their way into manufacturing and quality operations for biogenerics. One speaker, Adrian Bristow (National Institute for Standards and Technology) specifically called for greater adoption of proteomics methods by manufacturers of protein drugs.