|
|
Issue 1, October 2001
Proteomics: Pushing the Frontiers of Genomics
Dave Chokshi
Biology, Chemistry and Public Policy, Duke University
chokshi@jyi.org
Publication
of working drafts of the human genome in February 2001 was the capstone
achievement of two decades' work on deciphering the human "genetic
code." It was coronated as one of the greatest scientific endeavors
of mankind; it was certainly the greatest in recent history.
But any molecular biologist will tell you that cataloguing the genome
is only the first step to understanding the human body on a molecular
level. Scientists have already begun researching "the next big thing,"
a venture so complex that it dwarfs the Human Genome Project. Welcome
to the era of proteomics.
Proteomics is the study of the way proteins expressed by genes interact
inside cells. Essentially, the proteome is to proteins what the
genome is to genes. However, the questions that proteomics researchers
seek to answer are broader than those asked by genomicists-proteomics
will not be limited to documenting protein inventory. Specifically,
scientists will also have to examine relative abundance of proteins,
functionality of activation states of proteins, and the myriad permutations
of protein interactions.
A gene is a gene is a gene-to a first approximation, the DNA in
a neuron is the same as the DNA in a skin cell. The same cannot
be said about proteins. Some cells produce hemoglobin and insulin
abundantly, and some cells do not. Moreover, the amount of protein
produced varies by cell, and not just by cell type. Which proteins
a cell contains depends on its age, its physical environment, the
signals it receives from the nervous system, and even the time of
day!
Many scientists are coming to regard proteomics as the pre-eminent
approach to complex biological problems such as the nature of particular
molecular complexes or pathways in disease pathogenesis (Banks et
al., 2000). As evinced by the billions of dollars already poured
into proteomics research by venture capital, biotechnology, and
pharmaceutical companies, proteomics also holds hopes for innovative
drug development and advances in diagnostic medicine. But much basic
scientific research and evolution of experimental protocol must
be navigated before clinical breakthroughs are finally realized.
Novel Challenges in Proteomics
A glance at the
numbers reveals the main challenge in proteomics: complexity. Proteins
are made of approximately 20 modifiable amino acid building blocks;
compare that to the four static nucleic acid bases found in genes.
Latest estimates peg the number of genes in the human body at about
34,000; there are probably 500,000 or more structurally distinct
proteins in the body. Scientists once believed that genes could
tell the story of the remarkable complexity of the human body-hence
previous estimates of ~100,000 genes.
It may be that proteins are more responsible for our complexity
than genes. And the folded structure of a single protein is so complicated
that IBM plans to spend the next five years deciphering how just
one protein forms its particular shape. To do that, the company
will need to create a computer 500 times more powerful than any
in existence today (Fischer, 2000).
Investigating the biological relationship between genes and proteins
also reveals the intricacies of proteomics. The genome is the set
of instructions for making proteins; a gene is a blueprint for making
an individual protein. What the intracellular organelles, or protein
"factories," decide to make is indeed based on these blueprints,
but not strictly bound by them.
Some blueprints are so popular that they will be utilized millions
of times. Others might not be accessed at all in a particular cell.
Some organelles will mix and match genetic instruction to create
fusion or hybrid proteins. Unlike relatively stable DNA, proteins
get phosphorylated, sulfated, glycosylated, acetylated, and ubiquitinated.
A single gene ends up encoding multiple different proteins, and
by different methods: alternative splicing of the mRNA transcript,
variation of translation start and stop signals, and frameshifting
of codons (Fields 2001).
Genes have only one principal function, to conserve and provide
information. Proteins, on the other hand, have myriad functions,
from serving as intercellular messengers to identifying invading
pathogens. Conversely, some functions may be performed by multiple
different proteins. Furthermore, the proteome must be dynamic, and
versatile proteins must respond to altered environmental conditions
by relocating within the cell, adjusting their stability, and changing
the molecules they bind to.
A particularly difficult task in unraveling the complexity of the
proteome is mathematically describing the shapes of proteins. Proteins
are manufactured as linear strings of amino acids but self-organize
to yield the secondary, tertiary, and quaternary structures that
characterize the three-dimensional protein in a cellular environment.
We do not completely know how to predict a protein's ultimate structure
simply from its amino acid sequence. Computational tools for describing
this self-organization, as well as computer simulation of ligand-to-protein
docking, are only starting to be developed. At the same time, the
processes that link structure to function are fundamental to understanding
all proteins (Meredith, 2001).
There has been much speculation about a large-scale proteomics initiative,
or a Human Proteome Project, if you will. Without a doubt, researchers
would benefit from networking on so complicated a problem. But is
such a project in the cards?
There are some important disanalogies between genomics and proteomics
that render the prospects of a Human Proteome Project unfavorable.
Perhaps most importantly, the study of proteins diverges from the
study of genes in that there is no analog to the linear sequence
of DNA with a definite start and finish to examine. If proteomicists
were actually in pursuit of identifying every single protein in
the human body, the scope of the project would encompass almost
all of biology!
There are undoubtedly some aspects of proteomics that lend themselves
to systematic analysis. For instance, the Human Genome Project did
not tell us what genes are actually expressed in a cell or tissue.
Most of the genes are predicted to be genes based on application
of a genetic algorithm to the genomic sequence. Yet on the whole,
the consensus among biologists is to shy away from a Human Proteome
Project because the most biologically relevant questions in proteomics
deal with the dynamics of protein interactions rather than the cataloguing
of each individual protein (Bradbury, 2000).
The complexity inherent in proteomics calls for novel experimental
protocol and technology. For example, there is no protein analog
to the polymerase chain reaction (for DNA amplification) for simple
and efficient amplification of low-abundance proteins, so a range
of detection from one to several million molecules per cell is needed.
The analysis and significance of post-translational modifications
provides a major hurdle. Proteins' properties arise largely from
their folded structures, so general experimental methods-which do
not necessarily maintain the integrity of a protein's structure-are
difficult to apply. Clearly, ingenuity and innovation are required
to understand the astounding complexity of the human proteome.
Experimental
Approaches to Proteomics
The gold standard
in basic proteomics research is a technology known as two-dimensional
gel electrophoresis. A mixed protein sample is separated first by
total charge and then by molecular mass. In this technique, a solution
of cell contents (or other protein sample) is placed on a narrow polyacrylamide
strip with an immobilized pH gradient. Application of an electric
current induces the polypeptides to travel until they enter the region
of the gradient that is equivalent in acidity. This results in a gel
strip showing discrete protein bands that correspond with charge.
This strip is then placed against a rectangular polyacrylamide gel
containing sodium dodecyl sulfate (SDS). An electric current induces
variable migration of the bands from the strip to the rectangular
slab; thus, proteins are separated by size (Banks et al., 2000). Two-dimensional
gel electrophoresis is not without limitations, however. High-charge
or low-mass polypeptides are not resolved well. Proteins with large
hydrophobic regions (such as membrane-bound receptors) also are not
clearly visualized. The latter restriction has important implications
for drug development (see below), since membrane receptors are targets
for pharmaceutical intervention (Ezzell, 2000).
Computer analysis of gel images can be used to compare a sample with
others from the lab or with proteome databases accessible through
the Internet. For example, researchers may wish to compare protein
expression in healthy and diseased cells in the same tissue. Two-dimensional
gels also can be used to identify specific proteins, although other
methods have been developed that complement gel-based identification.
Mass spectrometric techniques yield particularly precise characterization.
Proteins or peptides (which can be isolated using gel electrophoresis)
are ionized using various procedures; the mass of the ions is measured
very accurately by coupled analyzers.
A similar protocol uses a protein that has been broken down into several
peptide subunits and, upon application of mass spectrometry, yields
a unique spectrometric fingerprint. Comparison of the fingerprint
to predicted peptide masses from digestions of sequences in genomic
databases identifies the protein.
Two-dimensional gel electrophoresis and mass spectrometry could be
described as "classical proteomics," the branch of proteomics concerned
with protein cataloguing. But as described previously, many challenges
in proteomics stem from functional characterization of proteins. A
method known as two-hybrid analysis is the principal experimental
tool used to probe protein functionality.
The general idea is that if two proteins interact with one another,
they usually participate in similar cellular functions. Two-hybrid
analysis gauges whether two proteins physically associate using a
clever technique. First, each protein is attached to separate fragments
of a third protein. The third protein is a transcription factor-it
has the ability to switch on genes. In this case, the third protein
switches on a reporter gene. There are generally two domains in a
transcription factor, the DNA-binding domain and the activating domain.
The DNA-binding domain is fused to one protein, the "bait" protein.
The activating domain is fused to the "prey" protein. Neither hybrid
can activate transcription of the reporter gene by itself. However,
if the two proteins of interest interact, then the two fragments of
the transcription factor come into sufficiently close contact to switch
on the reporter gene.
The name "two-hybrid" refers to the fact that two hybrid proteins
are actually interacting. Discovering that an unidentified protein
interacts with a protein of known function using two-hybrid analysis
yields important information. This concept has been termed "guilt
by association" (Oliver, 2000).
There are, of course, limitations to this technique. Two-hybrid analysis
reveals potential protein interactions, but not the biological context
in which they occur. Particular physiological conditions may yield
false positive or false negative interaction. Some interaction may
never be revealed by two-hybrid systems because the proteins involved
are actually located in separate cellular compartments. Nevertheless,
two-hybrid analysis is an important investigative tool for assaying
protein function.
The Role of Proteomics in Disease
J. Craig Venter,
guru of the private genome-mapping efforts, contends that all of
today's medicine will seem antiquated once proteomics research begins
yielding fruit (Fischer, 2000). Venter waxes optimistic because
characterization of defective or missing proteins is the key to
understanding diseases. Thus far, over half a dozen genes have been
implicated in increased risk of Alzheimer's disease. But the only
unambiguous diagnosis for Alzheimer's disease results from the presence
of protein fragments in the brain.
Generally, the study of proteomics is relevant to molecular medicine
for three reasons. First, almost all successful drugs either target
or are themselves proteins. Second, proteins constitute the "final"
product of gene expression. Finally, the function/dysfunction of
a protein and the pathways that it participates in are often dependent
on post-translational modifications that are not directly encoded
in the genome. Thus, proteomics has been advanced as a core technology
for translating genomic advances into a more coherent and pharmacologically
useful understanding of proteins in disease.
The first (and perhaps most obvious) biomedical application of proteomics
is extending progress in medical diagnosis. Diagnostic benefits
were advertised as genomics' bread-and-butter; proteomics would
build upon genomics in a predictable manner. One technology envisioned
is a sort of clinical molecular scanner, a device that could examine
tissue samples and detect subtle deviations from baseline normal
states of health based on protein analysis. A protein analog to
the DNA chip is also foreseen-such a chip would cheaply diagnose
and precisely stage a range of diseases in a single patient (Weber,
2000).
Other medical breakthroughs from proteomics share these parallels
to genomics. The challenge with genetic-based drug development was
refining understanding of a biological process with the aim of identifying
proteins pivotal to function. In genomics, a specific genetic lesion
was identified, the resultant changes in proteins were elucidated,
and a drug to counteract or correct aberrations was designed.
The difficult part of the method is determining the changes to proteins-often
the function of the protein is not well understood. From that point,
numerous compounds are tested against the target protein as potential
drug candidates, an expensive process. Thus, companies have an economic
incentive to use the knowledge gained from proteomics for target
validation. The ability to narrow down the possible proteins affected
(and isolating the actual effects) is time- and cost-saving.
Conversely, a major challenge in drug development is to increase
the number of potential protein targets from the approximately 500
against which virtually all drugs available today act to the estimated
10,000 potential protein targets (Banks et al., 2000). Pharmaceutical
companies generally shy away from allocating resources to find new
protein targets because of the large overhead costs involved. Proteomics,
with its ability to identify novel protein targets cheaply, would
provide opportunities for drug companies to move into new areas
of research.
Further, certain proteins are associated with drug toxicity-proteomics
research to this effect might serve as an early warning that a drug
candidate is associated with unacceptable side effects. By developing
profiles of proteins associated with side effects, proteomics can
help to identify side effects in drug candidates that might not
otherwise be identified until after expensive and lengthy clinical
trials.
Proteomics also has been hailed as the next step in basic science
contributing to diagnosis and therapy for neurological disorders
(e.g., Creutzfeld-Jakob disease), infectious diseases such as tuberculosis,
heart failure, and cancer. Proteomics will no doubt present novel
challenges in research, but the purported clinical benefits seem
well worth the difficulty. The information yielded by proteomics
will not only push the limits of genomics, but also push the frontiers
of the current biomedical revolution.
Suggested Reading
Banks, Rosamonde, et al. "Proteomics: new perspectives, new biomedical
opportunities." The Lancet. 356 (2000): 1749-56.
Bradbury, Jane. "Proteomics: the next step after genomics?" The
Lancet. 356 (2000): 50.
Ezzell, Carol. "Beyond the human genome." Scientific American.
283 (2000): 64-69.
Fields, Stanley. "Proteomics in Genomeland." Science. 291 (2001):
1221-24.
Fischer, Joannie. "Gene map in hand, the hunt for proteins is on."
U.S. News & World Report. 129 (2000): 47.
Meredith, Dennis. "Beyond the Human Genome." Dialogue. March
2, 2001: 3.
Oliver, Stephen. "Guilt-by-association goes global." Nature.
403 (2000): 601-03.
Weber, David. "Proteomics: The next frontier." Health Forum Journal.
43 (2000): 20-22.
Journal of Young
Investigators. 2001. Volume Five.
Copyright © 2001 by Dave Chokshi and JYI. All rights reserved.
|
|