Lower costs and increased funding converge to draw more researchers into studying cancer genomics

By Alan Dove, PhD

Cancer genomics is only a few years old, but investigators have already completed several important proof-of-concept studies. Now, the plummeting cost of whole-genome sequencing, plus a new bolus of Federal funding, promises to draw even more researchers into the field.

Just before Christmas, 1971, President Richard Nixon signed the National Cancer Act, which famously declared war on Americans' second-leading cause of death. "I hope in the years ahead we will look back on this action today as the most significant action taken during my Administration," said Nixon.

It may well have been. Setting aside the questionable wisdom of declaring war against a poorly understood, rapidly evolving, decentralized enemy, the Cancer Act established a dedicated stream of research funding for the disease. Nearly 40 years later, that legacy of support is helping scientists push cancer research into the genomic era, a transition that could revolutionize the treatment of tumors and leukemias.

Running the bases

Anna Barker, PhD, at the 2005 press conference announcing the launch of the Cancer Genome Atlas. (Source: TCGA)

The principle behind cancer genomics is simple: use high-throughput sequencing technology to find all of the genetic differences between normal and cancer cells, then use those differences as therapeutic targets. In practice, nearly every step of that process has been hard. New sequencing tools have helped tremendously, but not all of the field's barriers are technological.

"To be honest with you, the hardest part of this whole equation, beyond getting the technology to work and figuring out the bioinformatics, is actually having properly consented patient samples to access," says Elaine Mardis, PhD, associate professor of genetics at Washington University in St. Louis, Mo.

When a patient with acute myelogenous leukemia (AML) provided Mardis and her colleagues with a good set of samples recently, the researchers rushed straight to the sequencing lab, producing the first whole-genome sequence of a cancer.1 "[We sequenced] the tumor genome taken from the leukemia cells, and then we also sequenced the patient's normal tissue not affected by the cancer, which was just genomic DNA derived from a skin biopsy," says Mardis.

To do the sequencing, Mardis and her colleagues sonicated the cells' DNA into small fragments, then read just a few dozen bases from one end of each fragment. By processing enough fragments to oversample the genome 30-fold, the investigators were able to piece together a complete genomic sequence. Comparing the genomes of the leukemia and normal cells from the patient revealed two known, and eight previously unknown, mutations in the cancer.

President Nixon signing the National Cancer Act, December 23, 1971. (Source: NCI)

The team is now working on a second AML patient's genome, using the next generation of sequencing technology. "What we're doing is instead of just sampling the fragments at one end, we can turn the instrument around ... and we can sample the other end of the DNA fragment as well," says Mardis. This double end reading technique will allow the researchers to identify genomic rearrangements and duplications, as well as point mutations.

Sequencing additional patients' genomes will likely yield more potentially important mutations. Following their initial project, Mardis and her colleagues scanned an additional 93 AML patients by PCR, to see if they carried any of the same mutations as the first one. They didn't.

"On the one hand, that's a bit of a bummer because you scratch your head and wonder whether you're doing the right thing. On the other hand, I think it's not terribly surprising," says Mardis, adding that "at some level you know cancer is going to [be] a very personal disease." Sequencing every patient's genome in order to determine the best treatment may sound like an expensive proposition now, but Mardis and others in the field expect the price of sequencing to continue falling.

Mapping the oncome

While Mardis's effort was the first to sequence a complete cancer genome, it is certainly not the biggest cancer genomics project to date. That title belongs to the Cancer Genome Atlas (TCGA), a joint venture by the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI).

TCGA's ambitious goal is to compile a comprehensive list of all of the significant genetic changes in all major types of cancer. Rather than sequence each patient's complete genome, TCGA is using a multi-dimensional approach, scanning large groups of patient samples for specific classes of changes, such as variations in DNA methylation, DNA copy number changes, and mutations in genes believed to be involved in oncogenesis. The effort just finished its pilot phase, characterizing 206 cases of glioblastoma.2

It's an impressive accomplishment, but it's just the beginning. Besides screening additional glioblastoma samples, and screening them more deeply, the team is already lining up additional cancers. "We have initiated our work last year on ovarian cancer, and the first data from that effort [are] being analyzed as we speak. We anticipate much much quicker progress ... on ovarian cancer, and we have started lung cancer," says Anna Barker, PhD, deputy director of the NCI.

Metadherin-positive breast cancer with superimposed illustration of the computational approach that identified the 8q22 minimal genomic gain in poor-prognosis breast cancer. (Source: Yibin Kang, PhD)

Because TCGA is so large, and spread across so many laboratories, it has also confronted major data processing problems. "This team, I think, for the first time has produced data that is fully integrable across laboratories, whether it's for expression, or copy number, or even the epigenome," says Barker, adding that "We're sort of at a new plane of accomplishment in terms of the quality of the data and what you can do with the data."

That's not to say the problem is solved. Indeed, data analysis is the focus of one of TCGA's newest grant offerings, which were announced in January. The new grants will fund two types of centers, for genome characterization and genome data analysis. Barker envisions genome characterization centers doing the same types of wet lab projects TCGA has been working on, while the data analysis centers will focus more on bioinformatics and problems such as data storage. "This is the most data-intense project to date attempted in medicine," she says.

It's also sample-intense. The availability of properly collected and consented clinical samples is the main factor that determines which cancers TCGA can study. "We have to have the samples. That's been—and that's going to continue to be—a limiting factor, so what we're doing is basically looking not just in the US, but around the globe for samples and trying to queue those up in terms of what's available," says Barker.

Results from cancer genomics projects generally go into freely accessible databases, and users from numerous fields are already tapping into these systems. Barker says the TCGA data are popular with systems biologists, cancer biologists, and even clinicians. "You get quality data like this and believable data about all of the changes across a disease like this, it's paradigm shifting," she says.

A sequencing lab working on the NCI’s Cancer Genome Atlas project. (Source: TCGA)

Metadherin and migration

Yibin Kang, PhD, an assistant professor of molecular biology at Princeton University in Princeton, NJ, is one of the scientists mining this growing trove of data. In recent work, Kang and his colleagues used previously published sequence information from breast cancer patients to help discover additional genes involved in the disease's pathogenesis.

"It's pretty clear that if you can find recurrent genomic events ... in a large percentage of a particular type of cancer, oftentimes those [loci] are likely to contain genes that are functionally important to drive cancer formation, but this concept has not really been applied successfully to the study of metastasis and poor prognosis," says Kang. To address that, he and his colleagues analyzed data from three studies of patient cohorts with breast cancer, and found that a small genomic translocation correlated with poor prognosis. By testing the genes in that region in animals, they discovered that the surface protein metadherin can promote both metastasis and resistance to chemotherapy.

It was a satisfying result, as previous work using an entirely different technique had also pointed to metadherin as a metastasis factor. "It really shows that the gene is really important, you can find it in different ways, you can find it using phage display, a really basic molecular biology approach, or using very computational and genomic [methods] combined with clinical data and come up with the same conclusion," says Kang.

That convergence of results also shows the advantage of broad-based research efforts, according to Kang: "I think what we want is a new, young generation of scientists who are well-trained in both the clinical aspects of cancer research, the basic science, the laboratory aspects of animal models for example, and also the genomic computational biology."

As researchers bring diverse techniques and teams together to address the complex problem of cancer biology, getting their discoveries into the clinic will also require new medical technology, such as clinically-approved genome sequencing. "Are there all of the follow on types of science ... to actually begin to capitalize on that in a rapid way, or are we just going to be sort of stamp collecting for a really long period of time?" asks Washington University's Mardis. She adds that "there's a lot of work that needs to be done towards getting the rest of science ... ready for all of the things that we're going to discover and to be able to take advantage of that."


1. Ley et al., "DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome," Nature 456, 66-72 (6 November 2008).2. The Cancer Genome Atlas Research Network, "Comprehensive genomic characterization defines human glioblastoma genes and core pathways," Nature 455, 1061-1068 (23 October 2008).