|
|
Issue 3, September 2002
The Human Genome Project: Progress Toward Understanding Our Genetic Material
Vikram Pattanayak
Biochemistry and Biophysics, University of Pennsylvania
pattanayak@jyi.org
In
February 2001, the publicly-funded International Human Genome Sequencing
Consortium and the privately-owned company Celera Genomics announced
the completion of the sequencing of the human genome. To some, this
might mean we now know the identity of every single base pair in our
genetic material. Not exactly. Contrary to what many believe, the
Human Genome Project is not yet over. The big announcement in 2001
marked the completion of a draft sequence of the genome that had many
gaps; not every base from start to finish had been determined. To
qualify as fully sequenced, according to Hattori and Taylor's Nature
article, no more than one base per 10,000 can be incorrectly sequenced,
95% of the euchromatic (gene-coding) regions of the genome must be
sequenced, and there can be no gaps greater than 150 kb. When the
public consortium published its data, only two out of 23 human chromosomes
had been fully sequenced: chromosomes 21 and 22. In addition to finishing
sequencing, other important aspects of the genome project also remain
to be completed.
Overview
Many have compared the genome sequence to a New York City phone
book. If you read the names in the phone book, you will not learn
much about New Yorkers. Similarly, you will not learn much about
the human genome by reading the bases that constitute it. To be
able to use the genome, researchers have to completely annotate
it; they must locate genes and decipher their functions.
Learning about location and function alone does not tell the whole
story of the genome. Every cell in the human body contains all of
the genes in the human genome, yet most cells do not express, or
use, the same genes. Differing gene expression allows cells to take
on varying functions; for example, muscle cells express genes that
relate to muscle function and skin cells express those that relate
to skin function. Different genes are expressed in muscle cells
and skin cells. Understanding where genes are and how they become
activated will help researchers develop therapeutic strategies for
treating diseases caused by aberrant expression.
Another goal of the genome project is to sequence other organisms'
genomes. If similar genes in those other species can be identified,
human genes will be easier to find using molecular biology methods
that find related genes across organisms. Beyond these scientific
aims, the project includes studying the Ethical, Legal, and Social
Implications (ELSI) of sequencing the genome.
Finishing the Sequence
Roughly a year away from the expected completion, the public project
reports on its website
that it has fully sequenced 63% of the genome and another 34.8%
at draft quality. It aims to complete sequencing by April 25, 2003,
the 50th anniversary of James Watson and Francis Crick's landmark
paper on the structure of the genetic material, deoxyribonucleic
acid (DNA).
| |
Sequencers remain optimistic that the whole genome will be completed
on time.
|
Since the draft sequences were published, chromosome 20 has joined
chromosomes 21 and 22 as fully sequenced, with 99.5% of the gene-coding
regions covered. With only three chromosomes completely sequenced,
there seems to be a lot of work left before we will know the identity
of every base pair, or as close to it as we can get, in the human
genome. However, as indicated at a public project website, The
Human Genome: A Guide to Online Information Resources, chromosomes
6, 7, 13, and 14 are all more than 90% finished, and sequencers remain
optimistic that the whole genome will be completed on time.
Annotating the Genome
According to LocusLink,
a database that houses data for the public project, nearly 14,000
genes with a known functional product have been localized to sites
on chromosomes. An additional 4,900 genes that code for proteins
with unknown function have also been identified. So far, the public
project has characterized close to 20,000 of an estimated 35,000
genes in the human genome.
Few papers have been published about the genome project since the
draft sequence was released, so the best way to find data on the
progress is through websites. One of these websites, NCBI
Genes and Disease, shows the locations of some of the disease
genes found so far. Some of the characterized genes code for proteins
that may play roles in breast cancer, prostate cancer, cystic fibrosis,
and other disorders.
Ethical, Legal, and Social Implications
The ethical,
legal and social implications (ELSI) part of the Human Genome Project
has progressed more slowly than sequencing and annotation. In 1990,
the Department of Energy (DOE) and National Institutes of Health
(NIH) believed they would complete the project in 15 years, but
technological advances in sequencing accelerated the progress, allowing
a draft sequence to be published in 2001, according to About
the Human Genome Project, a website maintained by the DOE. No
technological advances could speed up the ethical studies to conclude
in time for the release of the draft sequence, but a five-year plan
established in 1998 has reset the goal for completion for the year
2003.
Five goals of ELSI were outlined by Francis Collins in the October
23, 1998 issue of Science:
- To
examine issues surrounding the completion of the human DNA sequence
and the study of human genetic variation
- To
examine issues raised by the integration of genetic technologies
and information into health care and public health activities
- To
examine issues raised by the integration of knowledge about genomics
and gene-environment interactions in non-clinical settings
- To
explore how new genetic knowledge may interact with a variety
of philosophical, theological, and ethical perspectives
- To
explore how racial, ethnic and socioeconomic factors affect the
use, understanding and interpretation of genetic information;
the use of genetic services; and the development of policy.
Significant resources -- 5% of the genome project's funding -- have
been invested to achieve these goals.
Comparative Analysis
Studying the human genome alone will not give researchers all the
information they need to learn about what human genes do and what
makes humans distinct from other species. However, due to ethical
concerns and the long amount of time between generations, many genetic
and biochemical studies on gene function and expression cannot be
conducted on humans. Researchers, therefore, are trying to sequence
the genomes of model organisms to better understand human genes.
These model organisms share two characteristics: they are small
with short times between generations.
When the draft sequence of the human genome was released in Science
and Nature, the bacterium Escherichia coli, the nematode
Caenorhabditis elegans, the plant Arabidopsis thaliana,
and the fruit fly Drosophila melanogaster had all been sequenced.
Since the human draft sequence was announced, a draft of the mouse
(Mus musculus) genome was completed and the rice Oryza
sativa was sequenced.
Comparisons between model organisms and humans can be very helpful.
If a gene sequence is found in all organisms, it most likely codes
for a vital product, such as a key metabolic enzyme. In contrast,
if a gene is found in only one species, it most likely codes for
a product that makes that species different from others. Sequence
similarity can also delineate evolutionary relationships between
humans and other organisms, since related species have more similar
genomes than unrelated species. In this way, comparative analysis
could shed light on what makes humans different than all other organisms.
Model organisms can also be experimentally manipulated to determine
the function of human genes. If a human gene has a homologue in
a model organism, researchers can experimentally prevent expression
of that gene in the model. They can then infer the function of the
human gene by seeing what functions the model loses in the experiment.
Due to ethical considerations and the long time gap between generations,
experiments like these, as in the case of gene function and expression
studies, cannot be done in humans.
Conclusion
A lot of progress in comparative analysis, along with the main aspects
of the Human Genome Project, sequencing, gene annotation, and ELSI,
has been made. Scientists are coming closer to the goal of understanding
the significance of the human genome and should have sequencing
complete by 2003. While sequencing just provides a phone book-like
database, the Human Genome Project as a whole is progressing to
a more meaningful comprehension of our genetic material.
Suggested Reading
Bork,
P., R. Copley. (2001) The draft sequences: filling in the gaps. Nature.
409: 818-820.
Collins, F. et. al. (1998) New goals for the U.S. Human Genome Project:
1998-2003. Science. 282: 682-9.
Hattori, M., T. D. Taylor. (2001) Part three in the book of genes.
Nature. 414: 854-5.
Marshall, E. (2001) Celera assembles mouse genome; public labs plan
new strategy. Science. 292: 822-3.
Pennisi, E. (2001) What's next for the genome centers? Science.
291: 1204-7.
Venter, C. et. al. (2001) The sequence of the human genome. Science.
291: 1304-1351.
Yu, J. et. al. (2002) A draft sequence of the rice genome. Science.
296: 79-92.
Further information on the progress of ethical, legal and social implications
of the Human Genome Project at:
Ethical,
Legal and Social Implications of the Human Genome Project Fact Sheet
Ethical,
Legal, and Social Issues -- Genome Research.
Journal of Young
Investigators. 2002. Volume Six.
Copyright © 2002 by Vikram Pattanayak and JYI. All rights reserved.
|
|
|