Author: Redig Mandy
Date: September 2005
Biological molecules can be divided into roughly two categories: nucleic acids and proteins. As their full names imply, both deoxyribonucleic acid (DNA) and ribonucleic acid (RNA) are nucleic acids that function to store and encode the information used to build proteins. Chromosomes, the molecular units of heredity, are composed of DNA organized into genes, while RNA, a less stable nucleic acid, is used to direct the process of protein synthesis. Under regulated conditions, specific regions of DNA corresponding to particular genes are transcribed into RNA that is then translated into proteins. As the second major class of biological molecules, proteins are perhaps best known for their enzymatic role in biological catalysis, but they are also needed for structure and support, movement, and cellular communication.
Following the discovery of the double helix structure of DNA by Watson and Crick in 1953, molecular biologists and biochemists have been interested in exploring the means by which nucleic acids encode information. The gradual accumulation of knowledge has revealed this process to be a marvel of intricate complexity. DNA is chemically quite simple, composed of variations of only four nucleotides. Each nucleotides consists of a five-carbon sugar, a phosphate group, and one of four purine or pyrimidine bases (Figure 1).
This repeating polymer is organized into functional units known as genes, and the collection of genes that make up an organism is referred to as its genome. With few exceptions, each cell of an organism contains a complete copy of its genome. The differences between individual cells in a multicellular organism are due to the regulated interactions and differential expression of particular genes. The protein products of gene expression interact with each other, with existing proteins in the cell, and often with the DNA itself to carefully control cellular conditions in a complicated pathway of feedback loops.
In light of the integrated relationship between genes and proteins, the best way to study genetic or protein function is in a global context. Since neither genes nor proteins exist in isolation, basic science investigations and clinical science applications must both consider pathway interactions. Until recently, the degree to which researchers could compare results in multiple contexts or design experiments on a global scale was limited. The sheer scope of information contained in a complete genome was beyond the realm of applicability. However, recent events have begun to change this outlook. The multitude of genome sequencing projects, highlighted by the preliminary completion of the Human Genome Project, represent the missing link for comprehensive studies of gene interactions. Vast databases of nucleic acid sequences and protein sequences can be easily accessed through the Internet, providing the research community with unparalleled opportunities for exploration. At the same time, this unprecedented amount of information has raised new problems. How can scientists design experiments to effectively utilize such an enormous volume of data? The field of genomics, the comprehensive study of genes, is attempting to answer this question, in part through the application of a specific technique, cDNA microarray.
cDNA microarray is based upon the mutual and specific affinity of complementary strands of DNA. The technique works and is applicable within a laboratory setting because it miniaturizes the quantity of information contained within a genome. Array size can range from a small subset of 500 genes to a large pool of 30,000 genes. Once the desired genes are chosen, individual clones for each must be obtained. Universal primers are used for the polymerase chain reaction (PCR) amplification of each gene either from a plasmid preparation or the bacterial vector itself. Agarose gel electrophoresis and random sequencing are used to confirm the validity of amplification. The steps in this stage can be done in 96-well plates to promote efficiency.
Once the purified samples have been prepared, they are individually spotted, usually in duplicate, onto glass slides in a predetermined array. While these are generally modified to promote the chemistry used in printing, these slides appear identical to the microscope slides used in any basic biology lab. A printed slide will contain two spots, each corresponding to a particular gene present in the array. Printing can be done in one of three ways: photolithography (Figure 2 (a)), mechanical microspotting (Figure 2 (b)), or ink jetting (Figure 2 (c)). Photolithography uses light to covalently attach the DNA strands to the slide, mechanical spotting uses spotting pins and capillary action to transport DNA, and ink jetting uses electric current to dispense the appropriate amount of DNA. Ultraviolet cross-linking after the slides have been spotted denatures the DNA and ensures that it will remain fixed to the glass surface (Figure 2).
Microarrays are used to probe differences in gene expression. In order to highlight these differences, the use of proper controls is vital. mRNA must be extracted from a normal control as well as the experimental samples and purified for use in the array experiment. This RNA can be obtained from a variety of sources including cell culture, tissue samples from animal models or clinical patients, and histologically-archived samples. Following mRNA extraction, reverse transcription PCR (RT-PCR) is used to convert the RNA transcripts into cDNA. The complete pool of cDNA is representative of transcriptional events in the tissue source of the RNA. The genes that were being actively transcribed in the sample will have mRNA copies that should have been first purified and then copied into cDNA during the RT-PCR step. The reverse transcription event for the control and experimental mRNA are identical in every step except one, and it is this step that enables differential gene expression to be determined. Nucleotides labeled with Cy3, a green fluorescent dye, are incorporated into the control cDNA while nucleotides labeled with Cy5, a red fluorescent dye, are incorporated into the experimental DNA. After preparation, both probes are mixed and allowed to hybridize to the glass slide. Excess hybridization buffer is washed off following an overnight incubation, and the slides are then ready to be scanned.
If one of the single-stranded cDNA probes corresponds to a single-stranded DNA gene printed on the slide, complementary interactions between the two will affix the probe to the slide. Laser scanning activates the fluorescent dyes incorporated into the probe, and areas on the slide with hybridized probes will be visible on the scanned image as red or green spots. Gene spots with no affixed probe appear black. The red spots correspond to genes expressed in the experimental sample while green spots correspond to genes expressed in the control sample. If a gene is expressed under both conditions, both probes will hybridize and the spot will appear yellow (Figure 3).
Sophisticated laser scanning equipment is used to import data into computer files that can be quantified on the basis of light intensity. Ratios comparing Cy5 and Cy3 intensities can be used to quantitatively evaluate gene expression. Under differing conditions, individual genes may be up-regulated or down-regulated, and the fluorescent signal of the marker dyes reflect these changes. A Cy5:Cy3 ratio of one indicates no change, a ratio of less than one indicates down-regulation (greater intensity in Cy3, the control), and a ratio of greater than one indicates up-regulation (greater intensity in Cy5, the experimental condition).
Perhaps the most difficult component of a microarray experiment is the evaluation of the data. The array format may make it technically possible to investigate genome interactions, but it does not simplify the complexities; a 10,000-gene array generates 10,000 data points. To make things even more complicated, results must be validated through replication. A typical microarray experiment may utilize ten, twenty, or thirty slides and produce vast quantities of data, all of which must be analyzed and pieced together to generate a coherent picture of the system under investigation. Data can be analyzed in several ways. Software packages developed by the biotechnology industry can be used to compare experiments and generate cluster diagrams based upon statistical evaluation. Such figures are useful in identifying general patterns in the data that can be used to direct further experiments. Figure 4 is an example of a cluster diagram, also known as a dendrogram, showing the relationships between genes on the array and the patterns of change on seven experimental slides.
Analysis of the raw numerical data is also useful to identify specific genes that are showing significant change in expression levels. The boundaries of significance are determined by the various mathematical tests performed and are a function of the questions being asked in the experiment. Useful data analysis hinges upon properly understanding the focus of the experiment. In order to successfully use the tremendous volume of data generated from a microarray slide, it is important to address the analysis to the specific aims of the experiment.
The scope of the microarray technique makes it applicable across a wide range of experimental situations. At its most fundamental level, development and disease occur because of changes in gene expression. Microarray allows experimental comparisons between normal and abnormal tissue with the express purpose of determining why one is indeed normal and the other is not. Developmental biologists can use the technique to monitor gene changes during development. Gene expression will certainly be modulated during the process of embryonic development, and microarray offers a way to monitor gene activity. Cancers can also be classified on the basis of their microarray expression profile. Some cancers are chemotherapy-resistant while others are not, and the differences between the two are probably due to differences at the gene expression level. Profiling a tumor sample may help clinicians and patients make a more informed choice in the design of cancer therapy. In addition, this aspect of the technique could be applied in a wider sense to the pharmaceutical industry. Drug expression profiling could be used to classify potential therapeutic agents based upon their molecular mechanism of action. Such profiling would expedite the molecular targeting and drug development process.
Microarray is not without its problems, however. At a technical level, the sheer amount of information expressed on a microarray chip opens the possibility of incorrect labeling. If a mistake is made at the cloning or PCR amplification stage in the printing process, the error can be transmitted through the hybridization step and a gene may be incorrectly identified. In addition, the hybridization technique itself is not simple. A maximum signal with a minimum of background fluorescence depends on a range of variables; time, temperature, light exposure, cover-slip position, and a host of other small details must be properly coordinated to ensure usable results. At the data analysis level, statistical complexities can lead to a confusing maze that does not lead to valid results. Care must be taken to properly apply established statistical principles as both false negatives and false positives can be propagated in microarray data. Microarray is fundamentally a technique to identify new areas of research interest. Like anything else, it cannot be taken out of context and used to establish conclusions that are not supported in other settings. Additional means of analyzing gene expression (Northern blotting or RNAse protection assays) must be used to corroborate microarray conclusions.
The last five years have seen unparalleled opportunities for discovery in the world of molecular biology. The mass sequencing of whole-organism genomes, previously thought to be so complex as to be impossible, is truly a monumental achievement. Technology has played a large role in making the visions of scientists a practical reality, and technology will continue to be involved in processing and applying the seemingly endless stream of data acquisition that has resulted from those visions. As a technique, microarray is a simple concept, but in an applied setting, it is a vital component of experimental analysis. The last five years have seen microarray develop from a clever idea into something functional as an experimental technique - the possibilities for the next five years are as limitless as the information contained in a genome.
Curran-Everett, Douglas. "Multiple comparisons: philosophies and illustrations." American Journal of Physiology and Regulatory and Integrative Comparative Physiology. 279 (2000) : R1-R8.
Gard J. et al. "Microarray dendrogram." Unpublished data. Powis Laboratory, Arizona Cancer Center, 2001.
Hegde P, et al. "A concise guide to cDNA microarray analysis." BioTechniques. 29 (2000): 548-562.
Kudoh K, et al. "Monitoring the expression profiles of doxorubicin-induced and doxorubicin-resistant cancer cells by cDNA microarray." Cancer Research 60 (2000): 4161-4166.
UK Mammalian Genetics Unit. The Medical Research Council. 27 Dec. 2001 http://www.mgu.har.mrc.ac.uk
Schena M, et al. "Microarrays: Biotechnology's discovery platform for functional genomics." Trends in Biotechnology. 16 (1998): 301-306.
Takahashi M, et al. "Gene expression profiling of clear cell renal cell carcinoma: Gene identification and prognostic classification." Proceedings of the National Academy of Sciences. 98 (2001): 9754-9759.
Microarray Group. University of Tokyo. 27 Dec. 2001 http://www.ims.u-tokyo.ac.jp/nakamura/eg_micro.html"