Morphological Changes and Patterns of Gene Expression in Drosophila Embryological Development: A Novel Computational Approach for Discovering Function of Genes of Unknown Function
The molecular basis for cell migration during morphogenesis has been a topic of intense interest and has yet to be fully understood. It is shown here how it is possible to construct a matrix to reflect different types of morphological changes associated with the "mass movements" which occur in Drosophila embryological development. It is also shown how this matrix can be correlated to a data from a time course microarray, to determine genes most correlated to their corresponding morphological change. Application of this method represents a novel methodology for discovering function for genes of unknown function. Also it is shown here that it is likely that many more genes are in involved in the morphological change defined here as germ band extension, than other changes in morphology examined. Findings reported here appear to be in agreement with existing evidence, specifically the importance of pair rule genes such as even-skipped in germ band extension. Further development of this technology would allow for inference of new possible functions of unknown genes in silico, to be confirmed by in vivo testing through biological and chemical approaches.
The ongoing Berkley Drosophila Genome Project (BDGP) has provided a wealth of information in the form of time course microarray data that gives expression of each gene each hour over the course of the first 12 hours of development. Although videos of Drosophila embryogenesis have been made, review of the available literature reveals that there has been no attempt to connect patterns of gene expression with morphological changes derived from video. An attempt has been made here to derive matrices representative of morphological change for germ band area to correlate with patterns of gene expression. The primary result is the attainment of genes thought to be the most correlated with the above changes in morphology, as well as the finding that many more genes are highly correlated with germ band extension than the other phenomena examined. Although highly correlated genes cannot be said to be the cause of the morphological change associated with them, it does indicate what genes are expressed in tandem with a particular phenomenon. In regards to one morphological change, germ band extension, an attempt was made to find genes most likely to be responsible for the event through sorting of correlated genes. It is believed that the genes identified through one of the methods defined here are likely to be very important in germ band extension. Since some genes are of unknown function, it is possible that these genes are related to those known to cause germ band extension. The clustering methods used here might also add the knowledge of pathways involved in germ band extension, which are critical to understanding "mass movements" involved in embryological development.
In Drosophila major morphological events which occur shortly after cellularization of the blastoderm embryo include the processes of gastrulation, germ band extension and retraction, dorsal closure, and head involution (Campos-Ortega and Hartenstein 1997). Here the developmental process of germ band extension is examined, though it must be noted that this process as discussed in the methods section has been modified from definitions defined in the literature though the same general phenomenon is being explored. As it has been described in the literature, germ band extension involves a two-and-a-half fold increase in germ band length while simultaneously narrowing in width. It can be seen that the germ band undergoes a migration from the anterior to the posterior end of the embryo along the dorsal surface (Camphos-Ortega and Hartenstein 1985). The narrowing and lengthening process involved in germ band extension in Drosophila has been found not only to be present in Drosophila but occur ubiquitously in metazoan development. Furthermore, it has been thought to represent greater tissue movement than any other developmental process (Keller et al. 2000). Examples of the narrowing lengthening process, which has been termed convergent extension, has been documented in elongation of dorsal axial embryonic tissues in fishes (Concha and Adams 1998), amphibians (Darken et al. 2002), birds (Schoenwolf and Yuan 1995), and mammals (Sausedo and Schoenwolf 1994). Therefore understanding germ band extension in Drosophila is imperative to understanding "mass movements" in development of multicellular organisms.
Germ band extension has been explained by (Irvine and Wieschaus 1994) in terms of dorsal contraction, which serves to initiate, orient, and increase the rate of germ band extension, and active cell intercalation, whereby cells merge between their dorsal and ventral neighbors lengthening the germ band. In this model it is thought that germ band extension is the result of an exterior force, dorsal contraction acting in concert with an internal force, active intercalation. It is proposed that active intercalation is at least partially the result of the proper functioning of certain pair-rule genes, which establish stripes of cells along the anterior posterior axis that differ in adhesiveness. It has been hypothesized that cells in bands of varying adhesiveness migrate to form a more thermodynamically stable configuration (Irvine and Wieschaus 1994). This model was proposed due to a lack of germ band extension in embryos with mutant pair-rule genes, mutant genes that regulate pair-rule genes, and ubiquitous eve expression.
If pair-rule genes, which encode transcription factors, were responsible for germ band extension, than it would appear logical that pair-rule gene expression would be highly correlated with germ band extension, i.e. that elevated expression levels during germ band extension would occur when most germ band extension occurred (hour 4). It is found here that expression levels of the 8 of the 9 pair-rule genes (Drysdale et al. 2005) fushi tarazu, even-skipped, odd paired, paired, runt, odd skipped, hairy, and sloppy paired had positive correlation, and showed elevated levels of gene expression in hour 4 when almost all germ band extension occurs (Tomancak et al. 2002). Four of these genes were highly correlated (correlation above .8). Tenascin major was the only pair-rule gene that had negative correlation and did not show elevated expression in hour 4 (percent expression above 8.333) (Table 1). This appears to indicate either certain pair-rule genes are correlated to germ band extension possibly by complete coincidence, or that expression of certain pair-rule genes is somehow very important in germ band extension.
Perhaps it is the case that Irvine and Wieschaus are correct in their assertion that pair-rule genes enhance cell intercalation, but do so not primarily by the adhesion model, if at all, but rather by some other mechanism(s). Since pair-rule genes act as transcription factors, it is possible that their expression activates expression of genes that are responsible for both an external force(s) such as dorsal contraction mentioned previously, and an unknown, or less than fully understood, though necessary, internal force(s) (Irvine and Wieschaus 1994). Here a method is proposed for finding genes and pathways responsible for germ band extension, whether they are activated by anterior posterior patterning, brought about by pair-rule genes as in Drosophila (Irvine and Wieschaus 1994; Zallen and Wieschaus 2004; Ninomiya et al. 2004), or by an additional mechanism(s).
It is likely that even if pair-rule genes such as even-skipped (Zallen and Wieschaus 2004) are the activators of germ band extension that these activate multiple processes involved in alterations in cellcell adhesion, cell interactions with the extracellular matrix, and changes in the organization of the dynamic cytoskeleton and the function of chemomechanical force producers (so-called motor proteins) that produce force for cell shape change and locomotion (Kiehart et al. 2000). Genes thought to be highly associated with germ band extension are given here, as are clustering patterns of correlated genes. It is proposed that a similar method can be used to find genes likely responsible for any phenomena where gene expression data is available.
Methods and MaterialsDetermination of Germ band Area
Video (Fink and Wieschaus 1991) was used with stoppage occurring approximately every 10 minutes real-time.Time Synchronization
Equation 1 the transformation equation (Figure 1) was used to convert actual times recorded from the video to the time measured by the BDGP. The equation was generated by pairing three time points. These time points were selected based on the similarity to a clearly defined stage (Roberts 1998). Three stages that met the criteria were stage 6 (Figure 2a) the start of germ band extension, stage 12e (Figure 2b) the start of germ band retraction, and stage 13l (Figure 2c) the end of germ band retraction. Since the time was not given by BDGP at which each these stages occurred but rather a scale was given (Figure 3), it was necessary to use GIMP 2.2.3 to convert the scaled data into numerical data. This was done through use of the GIMP MEASUREMENT function. These 3 points from the BDGP formed the Y coordinates while the 3 time points from the video formed the X coordinates a linear TRENDLINE was added with an R2 value of .9976 (Figure 1). All video values were converted to BDGP values through the transformation equation (Tomancak et al. 2002).Development of Matrices of Morphological Change
Using BDGP converted times paired with area values (Figure 4) an attempt was made to link the change in germ band area with morphological changes. Phenomena to be measured were germ band extension, expansion, and retraction. Extension was considered to be a period of consistent increase in area. Expansion was considered to be the period of less rapid area increase with much less consistency in area gain than extension. Retraction was defined to be a consistent period decreasing area. The term "consistency" is used because germ band growth and retraction at least in 2 dimensions is not unidirectional, meaning it does not involve an increase in area at every time point before it decreases at every time point. Fluctuations in growth and retraction occur that cannot be accounted for by error in the measurements, particularly during germ band expansion. Therefore an attempt was made to examine directional area change due to the conclusion that changes of the opposite sign to the overall change of the phenomena could skew the data, since the directional component was of the most interest in this case.
Four matrices were developed to model the pattern of morphological change for four events: directional extension, directional expansion, directional retraction and total directional area change (Table 2). Each matrix consists of 12 values each value represents the percent of the event that occurred within that hour. These values were determined by finding the standard deviation of the directional change that occurred for each hour. When selecting the first value of the time period the last value from the previous value should be used, unless of course it is the first time value where it should be used. The last time value for the hour should be the last value that occurs in that hour. Afterwards, each standard deviation value for every hour is divided by the sum of all the standard deviation values, thus giving the percent of change occurring in each period.Correlation of Changes in Morphology to Patterns of Gene Expression
Microsoft Excel was used to correlate the 12 values that make up each matrix to 12 values of gene expression (one for every hour) (see methods Tomancak et al. 2002). Since the BDGP involved three different experiments with slightly different results these values were averaged together, so that the expression level of each gene at each hour was in fact the average expression over the three experiments. Since Excel is limited to doing approximately 250 correlations at the same time several iterations had to be done to correlate all 12,470 genes (as of April 14, 2005) that have been subject to time course microarray by the BDGP.Determining What Genes are Involved in Germ band Extension
Although the Excel correlation did well to find genes with similar patterns to the morphological change, it was thought that the correlation alone might not be enough to indicate that these genes were the cause of each event. One of the events: directional germ band extension was examined to see if it might be possible find genes involved in the event through sorting. Since germ band length increases very rapidly (a two-and-a-half fold increase, with most of the extension occurring in the first 45 minutes (Campos-Ortega and Harkenstein1985), it was thought that genes involved in the activity (if they existed) would have a highly peaked expression pattern. A peaked pattern would also indicate a lesser likelihood for the correlation to be random. In an attempt to find genes involved primarily in germ band extension, genes were ranked by correlation removing all genes with less than a 0.6 correlation. These were then ranked by "weighted peaks" removing an arbitrary number with low values (in this case .003 or lower can very depending on specificity of search). Genes were then ranked again by correlation; below is an abbreviated version.
3. (MAX(1:12)-MIN(1:12)) / (SUM(1:12)
After genes were found their function of certain genes if determined was found in FLYBASE (Drysdale et al. 2005).K Means
K means clustering was done using Cluster 3.0 (Clustering Library version 1.29) (de Hoon et al. 2002) on all the genes with at least a 0.6 correlation to the directional extension matrix. The ORGANIZE GENES function was selected. The NUMBER OF CLUSTERS (K) was set at 10, while 100 runs were selected under NUMBER OF RUNS. K-MEANS was selected under METHOD and CORRELATION (UNCENTERED) was selected under SIMILARITY METRIC. TreeView 1.60 (Eisen 1998) was used to visualize the clustering.Self Organizing Map
A self-organizing map was done using the same data and software. The ORGANIZE GENES function was selected. XDIM and YDIM were set at 4. 100000 were selected under NUMBER OF ITERATIONS. INITIAL TAU was set at .02. SIMILARITY METRIC was set on CORRELATION (UNCENTERED).Principal Component Analysis (PCA)
PCA was also done using the same data and software.
Through measuring changes in area of the germ band with time (Figure 4) it was possible to construct matrices representing different morphological events (Table 2). Each matrix represents the percentage of the phenomenon that occurred in that particular hour. Figure 5A-D shows these matrices graphically. One of these matrices: the germ band extension matrix (Figure 5A), acted as a template for sorting genes that may associated with germ band extension. Although correlation matrices for directional expansion, retraction, and total area change succeeded in identifying correlated genes, far fewer highly correlated genes (0.8 correlation or higher) were found. Figure 6 shows a chart of the number of genes highly correlated with the correlation matrices for the events. Genes correlated with germ band extension are shown graphically in Figure 7, where correlation is plotted on the x axis and "weighted peaks" are shown on the y axis. This allows for visualization of both correlation and peaked expression at the same time (see point6orhigher.xls www.duke.edu/~ajh20 then run Cluster 3.0 http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster to visualize use TreeView 2.0 http://rana.lbl.gov/EisenSoftware.htm).
Discussion and Conclusions"Lag-Time" Accounted for by the Data
Since BDGP data involves total RNA expression over each hour (see methods Tomancak et al. 2002) it is likely that the data itself takes into account lag time, making the sorting method proposed most valid method. If a great deal of expression occurred at the beginning of an hour before most of the event took place in that hour, the data would reflect this expression had occurred. Whereas if there were smaller time points, such as minutes instead of hours, than a great deal of expression right before the event occurred would not be reflected in the occurrence of the event. Because the time unit is in hours in the BDGP data, the likelihood the data can account for the lag time is also much greater.More Genes involved Primarily in Directional Germ band Extension
If an event has more highly correlated genes than another event it means that there are likely more genes involved primarily with that event. It would not necessarily mean that more genes are necessary for the process to occur. Perhaps it is reasonable that directional germ band extension would have more genes involved primarily with it seeing how qualitatively it is such a dramatic event and quantitatively it has a much greater rate of change of germ band area with respect to time than directional expansion, and retraction (Figure 4). It might also be reasonable that directional total area change would have fewer genes involved primarily for it because of the unlikelihood of many genes functioning primarily for germ band growth and retractionNotable Genes
Many of the genes correlated with germ band extension appear to be transcription factors such as eve (0.92 correlation 9th most correlated out of 12,470 genes) and runt (0.74, 215th most correlated). Since these two pair-rule genes have been shown to highly important in germ band extension, Zallen and Wieschaus (2004) suggest that correlation without much sorting is all that is necessary to find genes important in germ band extension. The most highly correlated gene with directional germ band extension CG6117 (0.99 correlation) was found to be a gene for cAMP-dependent protein kinase thought to be involved in ATP binding, the intracellular signaling cascade, and muscle contraction among others (Drysdale 2005). Many of the correlated genes of known function particularly those with highly peaked expression patterns appeared to be involved with highly energetic functions such as CG17957 involved in actin binding (0.69 correlation, 2.56 unweighted peak score) and CG7428 involved in microtubule based movement (0.65 and 2.52).Finding Related Genes and Determining Pathways
Although sorting for genes by anything besides their correlation, such as genes with highly peaked expression patterns, is as of yet a highly speculative process, values generated from steps in the sorting process used here may still be valuable alongside their correlation values in finding related genes. It is believed that genes closer in proximity in correlation, correlation and peaked expression pattern, or in terms of k-means, self organizing maps, or principal component analysis clustering, may also be connected in function or process. Interpretation of these patterns may result not only in the genes likely to be responsible for germ band extension, but also give better understanding of how these gene products interact. Furthermore, the general observation that germ band growth and retraction do not occur in a directional manner appears similar to the fluctuations in dorsal closure Kiehart et al. witnessed (the event after germ band retraction), possibly indicating the events may involve similar processes.Findings
It has been shown here that changes in Drosophila morphology can be correlated with time course gene expression data, in a way that identifies genes that are likely to be involved in defined morphological events. As genes important for germ band extension and other phenomena were found (results not shown), it was found, in agreement with Wieschaus and others, that certain pair-rule genes are highly important in germ extension. Also, many more genes are involved in germ band extension than the other phenomena examined. It has been shown here that information gleaned from sorting correlated genes can be further used through clustering methods to search for pathways and systems involved associated with the phenomena of interest.Future Directions
The most obvious next step would be to examine the effect of knocking out genes most highly correlated with the changes in morphology described here. While information about the effects of several genetic knockouts are already available in the literature and Flybase (Drysdale et al. 2005), the BGDP gene disruption project (Bellen et al. 2004), may provide an ideal source of flies necessary for a more detailed study. If it was found that genes listed here did halt or at least slow the phenomena, it might be useful to detail at what point the effect occurred. A "map" such as this could be used along with the existing expression data to give us an understanding of the specific mechanisms and pathways involved. The technical difficulty involved in such a project could be minimized by employment of the methods described here. By shifting from the use of stages to describe spatio-temporal events, to the use of germ band area functions (Figure 4), it might be possible to give a more accurate picture of the effect of the knockout embryo relative to a normal embryo. Doing so might not only elucidate the process involved in the mass movements involved in metazoan development, but might also help to identify the function of genes of unknown function.
Thanks to Professors H. Fredrick Nijhout, Philip Benfey, Daniel Kiehart, and Greg Wray for their thoughtful comments and/or review of this manuscript
Bellen, HJ et al. (2004) The BDGP gene disruption project: single transposon insertions associated with 40% of Drosophila genes. Genetics 167:761-781
Campos-Ortega, JA and V Hartenstein (1985) The Embryonic Development of Drosophila melanogaster. New York, Springer-Verlag.
Campos-Ortega, JA and V Hartenstein (1997) The Embryonic Development of Drosophila melanogaster. Springer-Verlag, Berlin.
Concha, M. L. & Adams, R. J. 1998 Oriented cell divisions and cellular morphogenesis in the zebrafish gastrula and neurula: a time-lapse analysis. Development 125, 983-994.
Darken, R.S., Scola, A.M., Rakeman, A.S., Das, G., Mlodzik, M. and Wilson, P.A., 2002. The planar polarity gene strabismus regulates convergent extension movements in Xenopus. EMBO J. 21, 976985.
de Hoon, M. J. L., Imoto, S., Nolan, J., and Miyano, S. 2004: Open Source Clustering Software. Bioinformatics, 20 (9): 1453,1454.
Drysdale, R.A., Crosby, M.A., The FlyBase Consortium 2005. FlyBase: genes and gene models. Nucleic Acids Research 33:D390-D395.
Eisen, M. B., Spellman, P. T., Brown, P. O., and Botstein, D. 1998. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95, 1486314868.
Fink, R., Wieschaus E., 1991 Fruit Fly Embryogenesis (5:56) A Dozen Eggs Time-Lapse Microscopy of Normal Development. Produced and Distributed by Sinauer Associates, Inc., Sunderland, MA for the Society for Developmental Biology .
Irvine, K.D., Wieschaus, E. 1994. Cell intercalation during Drosophila germband extension and its regulation by pair-rule segmentation genes. Development 120: 827-841.
Keller, R., Davidson, L., Edlund, A., Elul, T., Ezin, M., Shook, D. and Skoglund, P., 2000. Mechanisms of convergence and extension by cell intercalation. Philos. Trans. R. Soc. Lond. B Biol. Sci. 355, 897922.
Kiehart, D.P., Galbraith, C.G., Edwards, K.A., Rickoll, W.L. and Montague, R.A., 2000. Multiple forces contribute to cell sheet morphogenesis for dorsal closure in Drosophila. J. Cell Biol. 149, 471490.
Ninomiya, H, Ellinson, RP, Winklbauer, R. 2004. Antero-posterior tissue polarity links mesoderm convergent extension to axial patterning. Nature 430: 364-367.
Roberts, E.B., Drosophila A Practical Approach 1998. New York, Oxford University Press.
Sausedo, R. A. and Schoenwolf, G. C. 1994. Quantitative analysis of cell behaviors underlying notochord formation and extension in mouse embryos. Anat. Rec 239, 103-112.
Schoenwolf, G. C. and Yuan, S 1995. Experimental analyses of the rearrangement of ectodermal cells during gastrulation and neurulation in avian embryos. Cell Tiss. Res 280, 243-251.
Tomancak P, Beaton A, Weiszmann R, Kwan E, Shu S, Lewis SE, Richards S, Ashburner M, Hartenstein V, Celniker SE, Rubin GB: Systematic determination of patterns of gene expression during Drosophila embryogenesis. Genome Biol 2002,3:research0088.1-0088.14.
Zallen, J. A. and Wieschaus, E. 2004. Patterned gene expression directs bipolar planar polarity in Drosophila. Dev. Cell 6, 343355.Website References Berkeley Drosophila Genome Project homepage Flybase homepage Open Source Clustering Software (Cluster 3.0) Eisen Software (TreeView 2.0)
Supplementary MethodsDetermination of Germ band Area
The PRINT SCREEN function was used and the picture was opened and saved in Microsoft Photo Editor 126.96.36.199 as a JPEG file. The file was then reopened in the GNU image manipulation program GIMP 2.2.3. After the program was opened the SELECT BY COLOR option was selected with the ANTIALIZING function selected under MODE and the SELECT TRANSPARENT AREAS function was selected under FINDING SIMILAR COLORS. The THRESHOLD was set at 100. Black (COLOR PICKER 0% RGB) was selected and all colors within the 100 THRESHOLD were cut. The PATH function was selected and placed on the tip of the posterior end of the embryo. A second PATH was selected on the anterior end of the embryo so that the line connecting PATH 1 and 2 forms a line without breaks. The STROKE PATH function was selected using the ERASE function SIZE 4. This effectively cuts the embryo in half so that the germline above the erased region can be measured. The SELECT SHAPES FROM IMAGE function was used to outline the germ band area. This area is defined as the continuous non-white area (COLOR PICKER less than 100% RGB) from the erased midline to the most anterior portion of the germ band. What constitutes the most anterior end of the germ band was defined as the most anterior area of the originally moving and retracting area. Once the most anterior region of the germ band became non-continuous with the other non exterior tissue the leading end of the germ band was defined as the leading end of the continuous region. If the germ band was in contact with the embryo's exterior the exterior area was not measured. If a non-white area inside the continuous region is greater than 100 pixels that area would be measured and subtracted from the total. The outlined area was then measured by selecting HISTOGRAM under DIALOGS the pixel number was recorded in Excel. These methods for selecting the germ band area are essential in assuring that consistent measurements of germ band area can be made.