Journal of Young Investigators
    Undergraduate, Peer-Reviewed Science Journal
Volume Six
    FEATURE ARTICLE
RECENT ISSUES | ARCHIVES | RESOURCES | JYI NEWS | ABOUT JYI 
Issue 4, October 2002

Bioinformatics: Life Science Research In Silico

Cristina Tang
Biochemistry, Simon Fraser University
tang@jyi.org


As late as the early 1990s, biology and related fields required very little experience with computers. Now, however, the vast amount of data on DNA sequences and proteins generated from the Human Genome Project and from labs around the world have made it clear that biologists will have to rely more on computers to organize, store and efficiently make use of these data for analysis. Thus, from the fusion of computer science and biology, a relatively new field was born: bioinformatics, a field that holds promise for speeding up drug discovery but also faces problems as it is integrated into our society.
 
Bioinformatics has emerged in response to major advances in molecular biology technologies, and has been made possible by the exponential growth of computer technology.

Bioinformatics has emerged in response to major advances in molecular biology technologies, and has been made possible by the exponential growth of computer technology. This interdisciplinary field bridging biology, math, and computer science has three major components: the organization of data generated from experiments into databases, the development of new algorithms and software, and the use of software for the interpretation and analysis of data. Each of these components contributes to the optimal use of the information generated from experiments.

Example of a database

Flybase illustrates the usefulness of a well-built database of biological information in research. This database contains all the currently available knowledge on the fruit fly, the model experimental organism Drosophila melanogaster. The information found on Flybase includes gene and protein sequences found in fruit flies, protein expression patterns and functions, literature references, as well as a list of researchers working with this organism. This database, which is updated regularly and accessible online, allows researchers to quickly obtain the information they need.

Database Search

Well-built databases are very useful for storing, organizing, and accessing information. However, in order to make sense of the data, researchers must be able to analyze and interpret them. This is where the bioinformatics “tools” -- algorithms developed for data analysis -- come in.

When a researcher sequences a piece of DNA, for example, he/she simply gets a long string of the letters: A, G, C, and T, each representing one of its constituent bases. By comparing this sequence to all of the available sequences in a given database, researchers could determine whether the sequence codes for any gene, or whether it is similar to any known sequence from another organism. However, due to the large amount of DNA sequences present in any database these days, this task may take months, or even years, if it is done manually.

Computers, on the other hand, can perform this task efficiently. One frequently used tool to search for similar sequences in a database is called BLAST (Basic Local Alignment Search Tool). As described in the BLAST Course found on the BLAST Web site, this program compares a user-specified DNA sequence to sequences in a database and outputs the results, starting from the sequences that best matches the input sequence based on its algorithm.

Armed with this and many other similar data analysis software tools, biologists have been able to make many discoveries, such as the identification of gene coding regions on DNA sequences, as well as possible links of genes to certain diseases.

Historical Overview of Bioinformatics

It is not known for sure when or who coined the term “bioinformatics” but according to Mark S. Boguski in “Bioinformatics – a new era,” (part of a supplement in the November 1998 issue of Immunology Today) it is generally agreed that it first appeared in scientific literature in 1991. . However, it has actually been around for more than two decades under the names molecular evolution or computational biology.

One of the early efforts to build databases and create analysis algorithms is Prophet, a UNIX-based workstation software package that allowed researchers to store, analyze and present data tables, graphs, statistical analyses and to perform mathematical modeling.

In 1982, a free database called GenBank was set up to store DNA sequence data. This database, at the National Center for Biotechnology Information (NCBI), currently holds about 17 billion bases from more than 100,000 organisms.

In the late 1980s Intelli-Genetics of Mountain View, CA, developed bioinformatics software called PC/GENE. This program was capable of translating gene sequences to proteins, predicting protein secondary structures and comparing information contained in different databases.

The early 1990s witnessed an explosion of databases and bioinformatics software developed with funding from government agencies and large private corporations. In 1991, for example, Amos Bairoch introduced the first version of SWISS-PROT, a protein sequence database. Currently, SWISS-PROT is a curated protein database under the ExPASy (Expert Protein Analysis System) proteomics server of the Swiss Institute of Bioinformatics (SIB). It contains over 100,000 protein sequences from close to 7,500 different species of organisms; the top three most represented species being humans, mice and yeast.

Currently, bioinformatics receives massive support from the U.S. National Institutes of Health (NIH). NIH supports the NCBI whose goal it is to develop new information technologies that will help life scientists in the study of molecular and genetic processes.

At the NCBI website, researchers can access many databases such as GenBank (a DNA sequence database), PubMed (a literature database) and GenPept (a protein database). They can also find tools for data searching and analysis. Entrez, for example, is a system that allows for retrieval of information from the databases found at NCBI. Other tools include those that allow researchers to analyze protein sequences, compare several DNA or protein sequences (e.g. BLAST) or to view the 3-D structure of proteins.

TABLE I - Major advances in molecular biology

TABLE II - Timeline of development of information databases and bioinformatic companies

Bioinformatics and Drug Discovery

Major pharmaceutical companies are also keeping up with the new field of bioinformatics. Some have opened specialized bioinformatics research units to help in their drug development process. As Mark Swindells and Richard Fagan pointed out in their June 2001 Chemical Innovation article “Target Discovery Using Bioinformatics," “success in the race to discover targets in the post-genomic world will go not simply to the companies with the greatest repertoire of privately held sequences, but to the companies with the greatest ability to mine the value locked in the burgeoning data archives."

Bioinformatics can, for example, be used to determine whether a drug target in a pathological bacterium is also present in humans. This will help researchers predict the drug's potential side effects. It might also allow them to decide early in the drug development process whether to abandon or continue with research on that drug, thus saving precious time and money. Moreover, bioinformatics can aid in the prediction of protein structures and functions (based on homology to known proteins) to determine the potential of a protein to be a drug target.

To keep up with the demand of large pharmaceuticals companies, smaller companies such as Accelerys (San Diego, CA), LION Bioscience AG (Heidelberg, Germany), and Incyte Genomics (Palo Alto, CA) are offering access to information services and data analysis software. Meanwhile, other companies have opted to form strategic alliances with the large pharmaceuticals; for example, Rosetta Inpharmatics, a developer of bioinformatics software, has now become a subsidiary of Merck & Co., Inc., located in Kirkland, WA.

Current Problems in Bioinformatics

The generation of biological information has increased with unprecedented speed in the past two decades. This has been matched by the rapid development of supercomputer power and has been backed with large monetary investments. But one thing has not kept up with this growth.

Presently, there is a lack of trained personnel in this interdisciplinary field. This could be one of the major limitations to the future expansion of bioinformatics. There are still very few programs in bioinformatics at major universities around the world. Even those schools offering it face the difficulty of finding and keeping individuals who possess the required expertise to teach, since biotech and pharmaceutical companies can offer more attractive salaries and benefits.

According to some companies, such as Rosetta Inpharmatics, and Abgenix, an ideal bioinformatician should have a solid background in biology and be very comfortable with UNIX and programming languages such as C and Perl. However, the generalized shortage of bioinformaticians has forced companies to hire computer scientists or mathematicians and teach them about biology, or hire biologists who have some self-taught computer skills.



One of the major limitations to the future expansion of bioinformatics... a lack of trained personnel in this interdisciplinary field.

 

The problem with this, however, as some critics point out, is that biologists often don't have very strong statistical and/or programming training. The computer scientists, on the other hand, often don't understand what the really meaningful biological questions are when creating algorithms.

Fortunately, this problem has been taken seriously and several universities in the United States are incorporating bioinformatics courses in their undergraduate curricula or establishing institutes of bioinformatics.

The University of California at Davis, for example, is putting $95 million into a new bioinformatics program. The Virginia Polytechnic Institute will invest $100 million for its new Virginia Bioinformatics Institute which will inhabit three buildings. Other universities such as the University of Florida at Gainesville, the University of Sciences in Philadelphia and George Mason University in Virginia have also formed bioinformatics departments on their campuses. It is hoped that these programs will provide life sciences students with a solid background in experimental biology as well as a good understanding of the computer tools available for the investigation of biological questions.

Conclusions

DNA sequences generated by the human genome project, protein structures, gene expression patterns, etc. carry with them an enormous amount of valuable information. To uncover all the information, however, will still take many more years of research. Bioinformatics, as we have seen, is set to become an indispensable tool. This is not to say that wet-bench work is not required; on the contrary, bioinformatics will give researchers the tools necessary to guide their bench work and handle computers as easily as they do the microscope.


Suggested Reading

Baxevanis, A.D. and Ouelletee B.F., ed. Bioinformatics: A practical Guide to the Analysis of Genes and Proteins, 2nd ed. New York: Wiley InterScience, 2001.

"Bioinformatics." Nature Biotechnology. 2000, 18(supplement): IT31-IT34. Reprint of Persidis, A. "Bioinformatics." Nature Biotechnology. 1999, 17: 828.830.

Bogurski, M., November 1998. "Bioinformatics: a new era." Immunology Today. 1998, 19(supplement).

Butler, D. "Are you ready for the revolution?" Nature. 2001, 409: 758-760.

Fagan, Richard and Swindells, Mark. Target Discovery using bioinformatics. Chemical Innovation vol.31(6) p24-28.

Jonietz E. "Boom amidst bust. Computing plus biology equals big money." Technology Review. 2001, 104(7):29.

Ricadela, A. IT at the edge of science. Information Week. 2001, 850: 30-37.

Roos, D. "Bioinformatics: Trying to swim in a sea of data." Science. 2001, 291(5507): 1260-66.

Stone, B., et.al. "How powerful new computers are helping researchers revolutionize drug development." Newsweek. 2001, 137(18): 54.

Web sites related to this topic

Degree Programs in Bioinformatics and Computational Biology (Australia, Europe and North America)
Bioinformatics milestones
Bio Source Link

Journal of Young Investigators. 2002. Volume Six.
Copyright © 2002 by Cristina Tang and JYI. All rights reserved.
 
SEARCH   |   SITE MAP   |   RECENT WEB SITE ADDITIONS          PRIVACY POLICY  |    CONTACT US

JYI is supported by: The National Science Foundation, The Burroughs Wellcome Fund, Glaxo Wellcome Inc., Science Magazine, Science's Next Wave, Swarthmore College, Duke University, Georgetown University, and many others.
Copyright ©1998-2003 The Journal of Young Investigators, Inc.