Genomic Variation: The Search for Our Past and Our Future

Author:  Chakma Justin

Date:  May 2007

[image # 1]

It was back in 1994 in an interview given to the London Times that the Human Genome Project's grand maestro, Francis Collins, first expressed his view that "finding genes is like trying to find a needle in a haystack." Over a decade has passed and scientists still persist in beating this clichéd analogy to death, which speaks to the genuine challenge of finding genes for human disease. However, the pace of discovery has certainly hastened. The convergence of several factors including the continuing globalization of scientific efforts, increasingly intimate collaboration between academia and industry as well as advances in genomic technologies have all contributed to the development and use of new tools such as genome-wide association studies. Genome-wide association studies entail matching a given human genome sequence from a large collection of DNA samples obtained from a population with well-defined clinical characteristics, with an annotated, high-resolution map of common genetic variation to statistically implicate regions of disease susceptibility or causation. These expensive studies are having an unprecedented impact on our understanding of genetic diseases.

A Brief History

In the era before the completion of the Human Genome Project, geneticists were limited primarily to family-based linkage studies that examined simple Mendelian disorders, hereditary diseases caused by the malfunction of a single gene, such as cystic fibrosis and Huntington's chorea. It was nearly impossible to examine causation or susceptibility for more complex diseases like cancer or diabetes, which are caused by multiple genes and multiple environmental factors, interacting in complicated ways. The completion of the Human Genome Project overcame this limitation in two significant ways.

First, the successful sequencing of the initial human genome paved the way for decreasing costs and improving technical capacity for sequencing many more future human genomes. Increasing the ease of sequencing of human genomes allowed for the comparison of sequences between normal populations and populations with disorders to implicate genomic regions involved in the pathology. Second, the project represented the first time that a major scientific effort involved collaboration between scientists across multiple countries facilitated by advances in information technology. This demonstrated the necessity of international collaboration through the sharing of population data that increased the statistical power of the studies to detect common variations among the diseased population.

One example of such a collaboration was the International HapMap Project completed in 2005. It attempted to reduce the cost and time of testing all 10 million estimated single nucleotide polymorphisms (SNPs - differences in individual base pairs that may itself contribute or highlight nearby DNA that contribute to disorders) by identifying haplotypes, which are regions of linked SNPs that tend to be inherited together. Researchers from Canada, China, Japan, Nigeria, United Kingdom and the United States collaborated to produce this catalog of common genetic similarities and differences in human beings. The project had far-reaching consequences for finding genes by making genome-wide association studies significantly easier by narrowing the potential genomic regions of interest. Moreover, it revealed the many genes that are potentially important in natural selection and human evolution. Thus, findings from genome-wide association studies are not limited to just our understanding of genetic disorders. By looking at the human genome, we can learn about our own history in a new way as well.


[image # 2] A Flattening' Playing Field: Industry-Academia Collaboration

In addition to international cooperation among scientists, new forms of horizontal collaboration, so-called public-private partnerships, between biotechnology companies and universities have led to improvements in genotyping technology critical to genome-wide association studies such as microarrays.

Microarrays are a collection of microscopic DNA spots attached to a solid surface such as a silicon chip that form an array for comparative genomic hybridization. Labeled probes from an unknown sample DNA will hybridize to the microarray's reference DNA and emit a signal only if the region is identical. Failure to hybridize indicates a potential difference such as an SNP. Manufacturers of microarray technology have been intimately involved with university researchers in developing and perfecting these technologies. Such collaboration dates back to the 1990s where a partnership between Affymetrix, the Santa Clara, Calif.-based pioneer of DNA microarrays, and the Massachusetts Institute of Technology (MIT) led to the development of the first highly multiplex microarrays that allowed comparison of 1000 SNPs.

Keith Jones, the vice president of molecular genetics at Affymetrix, recalls, "Affymetrix had a fairly good handle on the technology, but tapped into the expertise at MIT on association studies, the use of the markers, the ability to analyze the data, and also access to clinical samples. I think that it really highlighted the fact that Affymetrix had technology to bring to the table, and access to that technology, while the rest of the groups provided expertise on the analysis, and integrating them cross-platform, and putting the findings in the appropriate biological context. We ended up in a place that was much beyond what any one of those groups could have done by themselves."

"In general, the opportunities are the ability for Affymetrix to reach out to the community and understand whether our picture of the world of genetic analysis is in line with what the thought leaders in the academic world are thinking as well. That's the first set of feedback that we get, which helps us to build technology. Once we start building technology, another opportunity is to provide early access to the technology, which will hopefully be technology that is paradigm-changing or paradigm-enabling for these researchers. Without a doubt, what Affymetrix gets back is again feedback on whether we are doing the right things, feedback on the performance of the platform, but moreover access to alternative ways of both analyzing and thinking about problems and integrating this data into appropriate biological or clinical contexts."

"The benefits really do go both ways. We are providing access to this ground-breaking technology, and getting back understanding of the different ways these platforms and arrays can be used."

More recent collaboration with MIT yielded the recently released SNP 5.0 and upcoming SNP 6.0 whole genome-association arrays that allow the examination of multiple types of genomic variation in addition to SNPs.

"The SNP 5.0 array builds on past technology to allow the assessment of both SNPs and non-polymorphic copy number variant probes. It features 420,000 additional non-polymorphic probes on top of the 500,000 SNPs. However, the new SNP 6.0 will look at 1.8 million unique loci across the genome, which we consider really paradigm-shifting," said Andrew Noble, a senior manager at Affymetrix. "It enables us to look at multiple types of variance that are proving to be important in the realm of molecular genetics."

This comparative technology forms the basis of genome-wide association studies and thus increasing the SNP capacity of microarrays expedites the detection of genomic variations.

Noble also notes that more generally microarrays are enabling a new form of research that "allows one to start asking non-hypothesis based questions about genome-wide expression and regulatory networks on the different RNA species that we're seeing. I have a feeling that the data that is coming out right now suggests that there is much more transcription than we expected. I wouldn't be surprised that molecular biology textbooks will be rewritten in the next 15 years describing a gene regulatory network much more complex than we had really expected."

[image # 3] Type 2 Diabetes: A Case-Study of Genome-Wide Association Success

Such tools, both in the form of databases such as the HapMap Project and the introduction of more advanced microarrays, in addition to international and public-private collaboration, have thus enabled researchers to more easily determine the location of genes in association studies that compare haplotypes and other structural genomic features of individuals. Of the diseases that have been studied using these new tools, type 2 diabetes has perhaps achieved the most success.

Type 2 diabetes, or adult-onset diabetes, is a metabolic disease characterized by insulin resistance and reduced insulin production that is the leading cause of heart disease, stroke, blindness and kidney failure. Over 200 million people have diabetes worldwide and the World Health Organization estimates that this figure will reach 300 million by 2025. Scientists have known that diabetes is caused by genetic susceptibility as well as the environment and behavior. However, very little known was previously known about the genetic contribution.

"We could explain perhaps only 1% of the inherited risk of the disease," said Dr. David Altshuler, a Massachusetts General Hospital clinical endocrinologist and director of the Harvard/MIT Broad Institute's program in medical and population genetics. "One of the approaches had been to identify families in which there is an extreme form of the disease, maturity-onset diabetes, which is caused by a single gene. Those genes were found in the 1990's and there are six of them. Those genes explain only 1-2% of all cases of type 2 diabetes."

Rather than looking at families that have extreme forms of the disease, and looking for a single gene that causes it, an alternative approach is the whole-genome association studies that take people in the population and look at the common variation in the population and ask if any of them are epidemiological risk factors.

However, as Altshuler describes, "Until recently, it wasn't really possible to test this hypothesis comprehensively. Testing it requires knowing the sequence of the genome, knowing how that sequence varies in individuals, to be able to measure variations in patients, and see how they correlate with the disease. Moreover, you need to be able to do that not just in candidate genes, which are genes that somebody guesses to play a role based on animal or correlative studies, but rather in the whole genome, and let that data tell you which genes influence risk."

This non-hypothesis based approach also enables identification of regions that would have been incredibly difficult to find by traditional hypothesis-driven approaches.

"The reason why those genome-wide studies are possible is because we know the genome sequence, and have developed projects such as the HapMap Project to measure, characterize and catalog common variation. Moreover, new technologies such as microarrays, or DNA chips, made it possible to measure genetic variance from those projects."

"I would argue those human genome association studies have never been done in a way that one could be expected to succeed until the last year. We and many other people published papers in peer-reviewed journals that said that we need to test on the order of 500 000 genetic variants, in order to span at least 80% of the genome. The tools to do them didn't exist until the previous year."

Only last year, with all the components finally in place, was the first genome-wide association study for type 2 diabetes published. Conducted jointly by Philippe Froguel from Imperial College (London) and the pharmaceutical company Novartis, it linked type 2 diabetes risk for 8 SNPs in 5 genetic loci (position of the gene on the chromosome) including a gene that was discovered last year at Northwestern called TC7L2 and two earlier variants that had been found by candidate gene studies called KIR6.2 and PPARγ. Then last month, the Diabetes Genetics Initiative, a public-private partnership established in 2004 between Novartis, Altshuler's group at the Broad Institute and Lund University reported an additional 3 unsuspected regions of human DNA that contain clear genetic risk factors in the April 26th advanced edition of Science.


"What is really exciting is that this is not only happening in diabetes, but also in Crohn's disease, type 1 diabetes and rheumatoid arthritis, and novel identification of genes that have common variants in them, and critically confirmation by independent studies that these are in fact, verified reproducible genetic risk factors. Up until this year, there really hasn't been the data to tell us whether this was going to work in any general sense," Altshuler said.

But what will be the product of these research efforts?

Altshuler agrees that one of the major by-products will be to be able to predict disease risk more effectively. This is known as personalized medicine. However, he says that "this is not the most important thing. Based on my background as a medical physician and also as a patient, the ability to predict without the ability to improve outcome is not actually all that valuable. What is really important is the ability to improve the disease for everybody."

"We don't have effective therapies or cures for type 2 diabetes. There's lifestyle intervention, insulin injections and pills. They have benefits, but they are not cures. In other words, people clinically who have those treatments and lifestyle intervention still, on average, progress to diabetes. They do it a little more slowly, but they still have diabetes and they still as a group have a worsening of their disease over time. But it is delayed or mitigated."

"The hope is that further understanding will allow intervention. Perhaps the most impressive public health drug development is in cardiovascular disease. Cardiovascular disease is the #1 killer, but the rates are falling. There is a class of drugs called statin, which by lowering cholesterol affects the clotting of the blood and lower blood pressure. These came out of people studying patients and discovering that high blood pressure and cholesterol are risk factors for disease, understanding how it works and then developing treatments that reverse that risk factor."

"This is the ultimate value of the work we're doing. New risk factors that can be understood and reversed with treatment. The genes that have been found are, in general, surprises. This is good news because this tells us that that had we not done these studies, there are things that we now know, that we would not know about that can cause disease in people. These will now be the subject of reversal. It might involve lifestyle, diets or drugs."

Pharmaceutical companies have thus recognized the importance of genome-wide studies of understanding the root causes of common diseases in finding new targets. Another joint effort between Novartis, Broad Institute and Lund University led to a genome-wide map for type 2 diabetes. This and previously described involvements reflect the potential for genome-wide association studies to contribute to finding new drug targets, but also mark a shift towards specific rather than universal populations and "blockbuster drugs" with important implications for corporate research strategy manifesting in perhaps in more public-private partnerships.

[image # 4] [image # 5] Concerns about Public-Private Collaboration

In spite of concerns on intellectual property stemming from Celera Genomics and Craig Venter's infamous attempt to patent the human genome in through the 1990's, both companies and academics seem to have come to a happy medium through the public-private partnerships.

"There is an encouraging development, where companies are wanting to work with academics to get this done in the public domain. We posted all the results on the web immediately as soon as we had generated them, available to the scientific community where anybody can look at them," Altshuler describes.

And the Affymetrix executives also seem to agree. "I think that Affymetrix has a very liberal policy with regards to intellectual property and understands the benefits of not so much ownership of the intellectual property, but the ability to collaborate and accelerate the science. Therefore, our stance on things is not so much to own the intellectual property that comes out of these collaborations, but make sure that we've helped the community in general by accelerating the science."

However, academics like Altshuler continue to remain cautious, noting that "this is the natural way that this kind of science has gone. Somebody has always wanted to patent it."

He emphasizes the distinction between discovery and invention by drawing a comparison to blowing out your knee while skiing.

"If you go skiing and blow out your knee, every orthopedic surgeon and manufacturer of braces has knowledge that the anterior cruciate ligament is torn. Imagine that a company got proprietary knowledge of the idea that when your knee is hurt skiing, that this ligament is torn. Invariably, somebody invents a new device for surgery to patent that since that is an invention. They could patent that, that's an invention. The scientist or doctor who figured out what happens to you when you blow out your knee didn't invent that anterior cruciate ligament. They didn't invent that your anterior cruciate ligament is torn."

"When you discover that a gene is mutated to cause disease, you didn't invent that, you discovered it. Discoveries of that sort shouldn't be patented. They shouldn't belong to anybody. They belong to mankind. The drug or diagnostic test or other invention - that should be something that industry innovates around to encourage investment. That's the idea of public-private partnerships - to get these things into the public domain before people try to patent things that they didn't invent, but discovered."

"When we discover genetic variants that cause disease, we didn't invent them, we discovered them. They should be told to the world, and everybody in the world should know that this is what can cause diabetes. Others should invent ways of fixing them, and they can own those things."

Current Successes and Future Challenges

Whatever other doubts may exist, the initial success of these studies is indisputable. The last six months has seen a torrid flurry of announcements from genome-wide association studies for complex diseases, those with environmental and genetic determinants, including autism. Results from large-scale studies of depression and bipolar are expected in a few months time. Genome-wide association studies have not been limited to just the understanding of diseases, however. They have also contributed to fundamental biological discoveries. In October 2006, researchers at Arizona's Translational Genomic Research Institute with colleagues at the University of Zurich identified a gene for memory performance called Kibra.

As Domnique de Quervain of the University of Zurich described at the time: "Using sophisticated functional brain imaging techniques, we showed that individuals who had a version of the gene that is related to poorer memory potential had to tax their brains harder to remember the same amount of information."

"This memory study is a perfect example of how the use of advanced technologies in human genetics yields fundamental discoveries," said Stephen Fodor, Chairman and CEO at Affymetrix.

Despite the dramatic improvements in identifying pertinent genes, many scientific challenges also remain or are emerging. Given the number of genome-wide studies taking place, there is a possibility that the study might, by chance, significantly associate a gene with no relation to the disorder. Identifying a physical region in the genome of interest does not necessarily reveal causation as some causal mutations may lie within regulatory elements that can be difficult to identify. Recent evidence also suggests that higher-level structural variation involving deletions and duplications of thousands of bases, so-called copy number variation (CNVs), might contribute to complex diseases even more so than single nucleotide polymorphisms. Efforts are under way to develop a HapMap equivalent for CNVs. Prohibitive costs running in the millions limit the number of universities that can conduct such research, further emphasizing the need for collaboration. But perhaps most importantly many ethical questions have been raised about its applications to controversial issues such as forensics and race. It is important that such issues be approached critically and necessary investments continue to be made for such genome-wide association studies form the foundation from which twenty-first century scientific medicine will evolve from prenatal screening to gene therapy.

Further Reading (Review Articles sorted by Topic): Genome-Wide Association Studies

Christensen K, Murray JC. "What genome-wide association studies can do for medicine." New England Journal of Medicine 356, 1094-1097 (2007).

Hirschhorn JN, Daly MJ. "Genome-wide association studies for common diseases and complex traits." Nature Reviews Genetics 6, 95-108 (2005).

Wang, W. Y. S. et al. "Genome-wide association studies: theoretical and practical concerns." Nature Review Genetics. 6, 109–118 (2005)

Copy Number Variation and Higher-Scale Structural Variation

Feuk L et al. "Structural Variation in the Human Genome." Nature Review Genetics 7, 85-97 (2006).

Lupski, JR. "Structural Variation in the Human Genome." New England Journal of Medicine 356, 1169-1171 (2007).

Sharp AJ et al. "Structural Variation of the Human Genome." Annual Review of Genomics and Human Genetics 7, 407-442 (2006).

Effect on Human Evolution

Andrawiss M. "First phase of HapMap project already helping drug discovery." Nature Reviews Drug Discovery 4, 947 (2005).

Marsh S and McLeod HL. "Pharmacogenomics: from bedside to clinical practice." Human Molecular Genetics Spec No. 1, R89-93(2006).

Effect on Human Evolution

Sabeti PC et al. "Positive natural selection in the human lineage." Science 312, 1614-1620 (2006).

Ethical Concerns

Daar AS, Singer PA. "Pharmacogenetics and geographical ancestry: implications for drug development and global health." Nature Reviews Genetics 6, 241-246 (2005).

Foster MW, Sharp RR. "Beyond race: towards a whole-genome perspective on human populations and genetic variation." Nature Reviews Genetics 5, 790-796 (2004).

International HapMap Consortium. "Integrating ethics and science in the International HapMap Project." Nature Reviews Genetics 5, 467-475 (2004).

Type 2 Diabetes

Florez JC et al. "The inherited basis of diabetes mellitus: implications for the genetic analysis of complex traits." Annual Review of Genomics and Human Genetics 4, 257-291 (2003).

O'Rahilly S et al. "Genetic Factors in Type 2 Diabetes: The end of the Beginning?" Science 307, 370-373 (2005).

Patis C. "Disease genetics: Global association study targets type 2 diabetes." Nature Reviews Genetics 8, 250 (2007).

HapMap Project

Hemminki K et al. "The balance between heritable and environmental aetiology of human disease." Nature Reviews Genetics 7, 958-965(2006).

International HapMap Consortium. "A haplotype map of the human genome." Nature 437, 1299-1320 (2005).

Further Reading (Primary Articles sorted by Topic):

Autism Genome Project Consortium. "Mapping autism risk loci using genetic linkage and chromosomal rearrangements." Nature Genetics 39, 319-28 (2007).

Pappastiropoulos A. et al "Common Kibra alleles are associated with human memory performance." Science 314, 475-478 (2006).

Redon R et al. "Global variation in copy number in the human genome." Nature 444, 444-454 (2006).

Rioux JD et al. "Genome-wide association study identifies new susceptibility loci for Chrohn disease and implicates autophagy in disease pathogenesis." Nature Genetics 39, 596-604 (2007).

Saxena R et al. "Genome-Wide Association Analysis Identifies Loci for Type 2 Diabetes and Triglyceride Levels." Science (in press – 2007).

Sebat J et al. "Strong association of de novo copy number mutations with autism." Science 316, 445-449 (2007).

Sladek R et al. "A genome-wide association study identifies novel risk loci for type 2 diabetes." Nature 445, 881-885 (2007).

- By Justin Chakma