|
|
Issue 4, January 2002
Microarray: A Technique Review
Amanda Redig
Biochemistry, University of Arizona
redig@jyi.org
Biological
molecules can be divided into roughly two categories: nucleic acids
and proteins. As their full names imply, both deoxyribonucleic acid
(DNA) and ribonucleic acid (RNA) are nucleic acids that function
to store and encode the information used to build proteins. Chromosomes,
the molecular units of heredity, are composed of DNA organized into
genes, while RNA, a less stable nucleic acid, is used to direct
the process of protein synthesis. Under regulated conditions, specific
regions of DNA corresponding to particular genes are transcribed
into RNA that is then translated into proteins. As the second major
class of biological molecules, proteins are perhaps best known for
their enzymatic role in biological catalysis, but they are also
needed for structure and support, movement, and cellular communication.
Following the discovery of the double helix structure of DNA by
Watson and Crick in 1953, molecular biologists and biochemists have
been interested in exploring the means by which nucleic acids encode
information. The gradual accumulation of knowledge has revealed
this process to be a marvel of intricate complexity. DNA is chemically
quite simple, composed of variations of only four nucleotides. Each
nucleotides consists of a five-carbon sugar, a phosphate group,
and one of four purine or pyrimidine bases (Figure 1).
This repeating polymer is organized into functional units known
as genes, and the collection of genes that make up an organism is
referred to as its genome. With few exceptions, each cell of an
organism contains a complete copy of its genome. The differences
between individual cells in a multicellular organism are due to
the regulated interactions and differential expression of particular
genes. The protein products of gene expression interact with each
other, with existing proteins in the cell, and often with the DNA
itself to carefully control cellular conditions in a complicated
pathway of feedback loops.
In light of the integrated relationship between genes and proteins,
the best way to study genetic or protein function is in a global
context. Since neither genes nor proteins exist in isolation, basic
science investigations and clinical science applications must both
consider pathway interactions. Until recently, the degree to which
researchers could compare results in multiple contexts or design
experiments on a global scale was limited. The sheer scope of information
contained in a complete genome was beyond the realm of applicability.
However, recent events have begun to change this outlook. The multitude
of genome sequencing projects, highlighted by the preliminary completion
of the Human Genome Project, represent the missing link for comprehensive
studies of gene interactions. Vast databases of nucleic acid sequences
and protein sequences can be easily accessed through the Internet,
providing the research community with unparalleled opportunities
for exploration. At the same time, this unprecedented amount of
information has raised new problems. How can scientists design experiments
to effectively utilize such an enormous volume of data? The field
of genomics, the comprehensive study of genes, is attempting to
answer this question, in part through the application of a specific
technique, cDNA microarray.
Microarray
cDNA microarray is based upon the mutual and specific affinity of
complementary strands of DNA. The technique works and is applicable
within a laboratory setting because it miniaturizes the quantity
of information contained within a genome. Array size can range from
a small subset of 500 genes to a large pool of 30,000 genes. Once
the desired genes are chosen, individual clones for each must be
obtained. Universal primers are used for the polymerase chain reaction
(PCR) amplification of each gene either from a plasmid preparation
or the bacterial vector itself. Agarose gel electrophoresis and
random sequencing are used to confirm the validity of amplification.
The steps in this stage can be done in 96-well plates to promote
efficiency.
Once the purified samples have been prepared, they are individually
spotted, usually in duplicate, onto glass slides in a predetermined
array. While these are generally modified to promote the chemistry
used in printing, these slides appear identical to the microscope
slides used in any basic biology lab. A printed slide will contain
two spots, each corresponding to a particular gene present in the
array. Printing can be done in one of three ways: photolithography
(Figure 2 (a)), mechanical microspotting (Figure 2 (b)), or ink
jetting (Figure 2 (c)). Photolithography uses light to covalently
attach the DNA strands to the slide, mechanical spotting uses spotting
pins and capillary action to transport DNA, and ink jetting uses
electric current to dispense the appropriate amount of DNA. Ultraviolet
cross-linking after the slides have been spotted denatures the DNA
and ensures that it will remain fixed to the glass surface (Figure
2).
Microarrays are used to probe differences in gene expression. In order
to highlight these differences, the use of proper controls is vital.
mRNA must be extracted from a normal control as well as the experimental
samples and purified for use in the array experiment. This RNA can
be obtained from a variety of sources including cell culture, tissue
samples from animal models or clinical patients, and histologically-archived
samples. Following mRNA extraction, reverse transcription PCR (RT-PCR)
is used to convert the RNA transcripts into cDNA. The complete pool
of cDNA is representative of transcriptional events in the tissue
source of the RNA. The genes that were being actively transcribed
in the sample will have mRNA copies that should have been first purified
and then copied into cDNA during the RT-PCR step. The reverse transcription
event for the control and experimental mRNA are identical in every
step except one, and it is this step that enables differential gene
expression to be determined. Nucleotides labeled with Cy3, a green
fluorescent dye, are incorporated into the control cDNA while nucleotides
labeled with Cy5, a red fluorescent dye, are incorporated into the
experimental DNA. After preparation, both probes are mixed and allowed
to hybridize to the glass slide. Excess hybridization buffer is washed
off following an overnight incubation, and the slides are then ready
to be scanned.
If one
of the single-stranded cDNA probes corresponds to a single-stranded
DNA gene printed on the slide, complementary interactions between
the two will affix the probe to the slide. Laser scanning activates
the fluorescent dyes incorporated into the probe, and areas on the
slide with hybridized probes will be visible on the scanned image
as red or green spots. Gene spots with no affixed probe appear black.
The red spots correspond to genes expressed in the experimental sample
while green spots correspond to genes expressed in the control sample.
If a gene is expressed under both conditions, both probes will hybridize
and the spot will appear yellow (Figure 3).
Sophisticated laser scanning equipment is used to import data into
computer files that can be quantified on the basis of light intensity.
Ratios comparing Cy5 and Cy3 intensities can be used to quantitatively
evaluate gene expression. Under differing conditions, individual genes
may be up-regulated or down-regulated, and the fluorescent signal
of the marker dyes reflect these changes. A Cy5:Cy3 ratio of one indicates
no change, a ratio of less than one indicates down-regulation (greater
intensity in Cy3, the control), and a ratio of greater than one indicates
up-regulation (greater intensity in Cy5, the experimental condition).
Perhaps the most difficult component of a microarray experiment is
the evaluation of the data. The array format may make it technically
possible to investigate genome interactions, but it does not simplify
the complexities; a 10,000-gene array generates 10,000 data points.
To make things even more complicated, results must be validated through
replication. A typical microarray experiment may utilize ten, twenty,
or thirty slides and produce vast quantities of data, all of which
must be analyzed and pieced together to generate a coherent picture
of the system under investigation. Data can be analyzed in several
ways. Software packages developed by the biotechnology industry can
be used to compare experiments and generate cluster diagrams based
upon statistical evaluation. Such figures are useful in identifying
general patterns in the data that can be used to direct further experiments.
Figure 4 is an example of a cluster diagram, also known as a dendrogram,
showing the relationships between genes on the array and the patterns
of change on seven experimental slides.
Analysis of the raw numerical data is also useful to identify specific
genes that are showing significant change in expression levels. The
boundaries of significance are determined by the various mathematical
tests performed and are a function of the questions being asked in
the experiment. Useful data analysis hinges upon properly understanding
the focus of the experiment. In order to successfully use the tremendous
volume of data generated from a microarray slide, it is important
to address the analysis to the specific aims of the experiment.
The scope of the microarray technique makes it applicable across a
wide range of experimental situations. At its most fundamental level,
development and disease occur because of changes in gene expression.
Microarray allows experimental comparisons between normal and abnormal
tissue with the express purpose of determining why one is indeed normal
and the other is not. Developmental biologists can use the technique
to monitor gene changes during development. Gene expression will certainly
be modulated during the process of embryonic development, and microarray
offers a way to monitor gene activity. Cancers can also be classified
on the basis of their microarray expression profile. Some cancers
are chemotherapy-resistant while others are not, and the differences
between the two are probably due to differences at the gene expression
level. Profiling a tumor sample may help clinicians and patients make
a more informed choice in the design of cancer therapy. In addition,
this aspect of the technique could be applied in a wider sense to
the pharmaceutical industry. Drug expression profiling could be used
to classify potential therapeutic agents based upon their molecular
mechanism of action. Such profiling would expedite the molecular targeting
and drug development process.
Microarray is not without its problems, however. At a technical level,
the sheer amount of information expressed on a microarray chip opens
the possibility of incorrect labeling. If a mistake is made at the
cloning or PCR amplification stage in the printing process, the error
can be transmitted through the hybridization step and a gene may be
incorrectly identified. In addition, the hybridization technique itself
is not simple. A maximum signal with a minimum of background fluorescence
depends on a range of variables; time, temperature, light exposure,
cover-slip position, and a host of other small details must be properly
coordinated to ensure usable results. At the data analysis level,
statistical complexities can lead to a confusing maze that does not
lead to valid results. Care must be taken to properly apply established
statistical principles as both false negatives and false positives
can be propagated in microarray data. Microarray is fundamentally
a technique to identify new areas of research interest. Like anything
else, it cannot be taken out of context and used to establish conclusions
that are not supported in other settings. Additional means of analyzing
gene expression (Northern blotting or RNAse protection assays) must
be used to corroborate microarray conclusions.
The last five years have seen unparalleled opportunities for discovery
in the world of molecular biology. The mass sequencing of whole-organism
genomes, previously thought to be so complex as to be impossible,
is truly a monumental achievement. Technology has played a large role
in making the visions of scientists a practical reality, and technology
will continue to be involved in processing and applying the seemingly
endless stream of data acquisition that has resulted from those visions.
As a technique, microarray is a simple concept, but in an applied
setting, it is a vital component of experimental analysis. The last
five years have seen microarray develop from a clever idea into something
functional as an experimental technique - the possibilities for the
next five years are as limitless as the information contained in a
genome.
Suggested Reading
Curran-Everett,
Douglas. "Multiple comparisons: philosophies and illustrations."
American Journal of Physiology and Regulatory and Integrative Comparative
Physiology. 279 (2000) : R1-R8.
Gard J. et al. "Microarray dendrogram." Unpublished data. Powis Laboratory,
Arizona Cancer Center, 2001.
Hegde P, et al. "A concise guide to cDNA microarray analysis."
BioTechniques. 29 (2000): 548-562.
Kudoh K, et al. "Monitoring the expression profiles of doxorubicin-induced
and doxorubicin-resistant cancer cells by cDNA microarray." Cancer
Research 60 (2000): 4161-4166.
UK Mammalian Genetics Unit. The Medical Research Council. 27 Dec.
2001 http://www.mgu.har.mrc.ac.uk
Schena M, et al. "Microarrays: Biotechnology's discovery platform
for functional genomics." Trends in Biotechnology. 16
(1998): 301-306.
Takahashi M, et al. "Gene expression profiling of clear cell
renal cell carcinoma: Gene identification and prognostic classification."
Proceedings of the National Academy of Sciences. 98 (2001):
9754-9759.
Microarray Group. University of Tokyo. 27 Dec. 2001 http://www.ims.u-tokyo.ac.jp/nakamura/eg_micro.html"
Journal
of Young Investigators. 2002. Volume Five.
Copyright © 2002 by Amanda Redig and JYI. All rights reserved.
|
|