The Human Genome Project: Progress Toward Understanding Our Genetic Material

Author:  Pattanayak Vikram
Institution:  Biochemistry and Biophysics
Date:  September 2005

In February 2001, the publicly-funded International Human Genome Sequencing Consortium and the privately-owned company Celera Genomics announced the completion of the sequencing of the human genome. To some, this might mean we now know the identity of every single base pair in our genetic material. Not exactly. Contrary to what many believe, the Human Genome Project is not yet over. The big announcement in 2001 marked the completion of a draft sequence of the genome that had many gaps; not every base from start to finish had been determined. To qualify as fully sequenced, according to Hattori and Taylor's Nature article, no more than one base per 10,000 can be incorrectly sequenced, 95% of the euchromatic (gene-coding) regions of the genome must be sequenced, and there can be no gaps greater than 150 kb. When the public consortium published its data, only two out of 23 human chromosomes had been fully sequenced: chromosomes 21 and 22. In addition to finishing sequencing, other important aspects of the genome project also remain to be completed.


Many have compared the genome sequence to a New York City phone book. If you read the names in the phone book, you will not learn much about New Yorkers. Similarly, you will not learn much about the human genome by reading the bases that constitute it. To be able to use the genome, researchers have to completely annotate it; they must locate genes and decipher their functions.

Learning about location and function alone does not tell the whole story of the genome. Every cell in the human body contains all of the genes in the human genome, yet most cells do not express, or use, the same genes. Differing gene expression allows cells to take on varying functions; for example, muscle cells express genes that relate to muscle function and skin cells express those that relate to skin function. Different genes are expressed in muscle cells and skin cells. Understanding where genes are and how they become activated will help researchers develop therapeutic strategies for treating diseases caused by aberrant expression.

Another goal of the genome project is to sequence other organisms' genomes. If similar genes in those other species can be identified, human genes will be easier to find using molecular biology methods that find related genes across organisms. Beyond these scientific aims, the project includes studying the Ethical, Legal, and Social Implications (ELSI) of sequencing the genome.

Finishing the Sequence[.section title]

Roughly a year away from the expected completion, the public project reports on its website that it has fully sequenced 63% of the genome and another 34.8% at draft quality. It aims to complete sequencing by April 25, 2003, the 50th anniversary of James Watson and Francis Crick's landmark paper on the structure of the genetic material, deoxyribonucleic acid (DNA).

Since the draft sequences were published, chromosome 20 has joined chromosomes 21 and 22 as fully sequenced, with 99.5% of the gene-coding regions covered. With only three chromosomes completely sequenced, there seems to be a lot of work left before we will know the identity of every base pair, or as close to it as we can get, in the human genome. However, as indicated at a public project website, The Human Genome: A Guide to Online Information Resources, chromosomes 6, 7, 13, and 14 are all more than 90% finished, and sequencers remain optimistic that the whole genome will be completed on time.

Annotating the Genome

According to LocusLink, a database that houses data for the public project, nearly 14,000 genes with a known functional product have been localized to sites on chromosomes. An additional 4,900 genes that code for proteins with unknown function have also been identified. So far, the public project has characterized close to 20,000 of an estimated 35,000 genes in the human genome.

Few papers have been published about the genome project since the draft sequence was released, so the best way to find data on the progress is through websites. One of these websites, NCBI Genes and Disease, shows the locations of some of the disease genes found so far. Some of the characterized genes code for proteins that may play roles in breast cancer, prostate cancer, cystic fibrosis, and other disorders.

Ethical, Legal, and Social Implications

The ethical, legal and social implications (ELSI) part of the Human Genome Project has progressed more slowly than sequencing and annotation. In 1990, the Department of Energy (DOE) and National Institutes of Health (NIH) believed they would complete the project in 15 years, but technological advances in sequencing accelerated the progress, allowing a draft sequence to be published in 2001, according to About the Human Genome Project, a website maintained by the DOE. No technological advances could speed up the ethical studies to conclude in time for the release of the draft sequence, but a five-year plan established in 1998 has reset the goal for completion for the year 2003.

Five goals of ELSI were outlined by Francis Collins in the October 23, 1998 issue of Science:

To examine issues surrounding the completion of the human DNA sequence and the study of human genetic variation

To examine issues raised by the integration of genetic technologies and information into health care and public health activities

To examine issues raised by the integration of knowledge about genomics and gene-environment interactions in non-clinical settings

To explore how new genetic knowledge may interact with a variety of philosophical, theological, and ethical perspectives

To explore how racial, ethnic and socioeconomic factors affect the use, understanding and interpretation of genetic information; the use of genetic services; and the development of policy.

Significant resources -- 5% of the genome project's funding -- have been invested to achieve these goals.

Comparative Analysis

Studying the human genome alone will not give researchers all the information they need to learn about what human genes do and what makes humans distinct from other species. However, due to ethical concerns and the long amount of time between generations, many genetic and biochemical studies on gene function and expression cannot be conducted on humans. Researchers, therefore, are trying to sequence the genomes of model organisms to better understand human genes. These model organisms share two characteristics: they are small with short times between generations.

When the draft sequence of the human genome was released in Science and Nature, the bacterium Escherichia coli, the nematode Caenorhabditis elegans, the plant Arabidopsis thaliana, and the fruit fly Drosophila melanogaster had all been sequenced. Since the human draft sequence was announced, a draft of the mouse (Mus musculus) genome was completed and the rice Oryza sativa was sequenced.

Comparisons between model organisms and humans can be very helpful. If a gene sequence is found in all organisms, it most likely codes for a vital product, such as a key metabolic enzyme. In contrast, if a gene is found in only one species, it most likely codes for a product that makes that species different from others. Sequence similarity can also delineate evolutionary relationships between humans and other organisms, since related species have more similar genomes than unrelated species. In this way, comparative analysis could shed light on what makes humans different than all other organisms.

Model organisms can also be experimentally manipulated to determine the function of human genes. If a human gene has a homologue in a model organism, researchers can experimentally prevent expression of that gene in the model. They can then infer the function of the human gene by seeing what functions the model loses in the experiment. Due to ethical considerations and the long time gap between generations, experiments like these, as in the case of gene function and expression studies, cannot be done in humans.


A lot of progress in comparative analysis, along with the main aspects of the Human Genome Project, sequencing, gene annotation, and ELSI, has been made. Scientists are coming closer to the goal of understanding the significance of the human genome and should have sequencing complete by 2003. While sequencing just provides a phone book-like database, the Human Genome Project as a whole is progressing to a more meaningful comprehension of our genetic material.

Suggested Reading

Bork, P., R. Copley. (2001) The draft sequences: filling in the gaps. Nature. 409: 818-820.

Collins, F. et. al. (1998) New goals for the U.S. Human Genome Project: 1998-2003. Science. 282: 682-9.

Hattori, M., T. D. Taylor. (2001) Part three in the book of genes. Nature. 414: 854-5.

Marshall, E. (2001) Celera assembles mouse genome; public labs plan new strategy. Science. 292: 822-3.

Pennisi, E. (2001) What's next for the genome centers? Science. 291: 1204-7.

Venter, C. et. al. (2001) The sequence of the human genome. Science. 291: 1304-1351.

Yu, J. et. al. (2002) A draft sequence of the rice genome. Science. 296: 79-92.

Further information on the progress of ethical, legal and social implications of the Human Genome Project at:

Ethical, Legal and Social Implications of the Human Genome Project Fact Sheet Ethical, Legal, and Social Issues -- Genome Research.