|
|
Issue 4, October 2002
Bioinformatics: Life Science Research In Silico
Cristina Tang
Biochemistry, Simon Fraser University
tang@jyi.org
As
late as the early 1990s, biology and related fields required very
little experience with computers. Now, however, the vast amount of
data on DNA sequences and proteins generated from the Human Genome
Project and from labs around the world have made it clear that biologists
will have to rely more on computers to organize, store and efficiently
make use of these data for analysis. Thus, from the fusion of computer
science and biology, a relatively new field was born: bioinformatics,
a field that holds promise for speeding up drug discovery but also
faces problems as it is integrated into our society.
| |
Bioinformatics
has emerged in response to major advances in molecular biology
technologies, and has been made possible by the exponential
growth of computer technology.
|
Bioinformatics
has emerged in response to major advances in molecular biology technologies,
and has been made possible by the exponential growth of computer
technology. This interdisciplinary field bridging biology, math,
and computer science has three major components: the organization
of data generated from experiments into databases, the development
of new algorithms and software, and the use of software for the
interpretation and analysis of data. Each of these components contributes
to the optimal use of the information generated from experiments.
Example of a database
Flybase
illustrates the usefulness of a well-built database of biological
information in research. This database contains all the currently
available knowledge on the fruit fly, the model experimental organism
Drosophila melanogaster. The information found on Flybase
includes gene and protein sequences found in fruit flies, protein
expression patterns and functions, literature references, as well
as a list of researchers working with this organism. This database,
which is updated regularly and accessible online, allows researchers
to quickly obtain the information they need.
Database Search
Well-built
databases are very useful for storing, organizing, and accessing
information. However, in order to make sense of the data, researchers
must be able to analyze and interpret them. This is where the bioinformatics
tools -- algorithms developed for data analysis -- come
in.
When a researcher sequences a piece of DNA, for example, he/she
simply gets a long string of the letters: A, G, C, and T, each representing
one of its constituent bases. By comparing this sequence to all
of the available sequences in a given database, researchers could
determine whether the sequence codes for any gene, or whether it
is similar to any known sequence from another organism. However,
due to the large amount of DNA sequences present in any database
these days, this task may take months, or even years, if it is done
manually.
Computers,
on the other hand, can perform this task efficiently. One frequently
used tool to search for similar sequences in a database is called
BLAST (Basic Local Alignment Search Tool). As described in the BLAST
Course found on the BLAST
Web site, this program compares a user-specified DNA sequence
to sequences in a database and outputs the results, starting from
the sequences that best matches the input sequence based on its
algorithm.
Armed
with this and many other similar data analysis software tools, biologists
have been able to make many discoveries, such as the identification
of gene coding regions on DNA sequences, as well as possible links
of genes to certain diseases.
Historical
Overview of Bioinformatics |
| It
is not known for sure when or who coined the term bioinformatics
but according to Mark S. Boguski in Bioinformatics
a new era, (part of a supplement in the
November 1998 issue of Immunology Today) it is generally
agreed that it first appeared in scientific literature
in 1991. . However, it has actually been around for
more than two decades under the names molecular evolution
or computational biology.
One of the early efforts to build databases and create
analysis algorithms is Prophet, a UNIX-based workstation
software package that allowed researchers to store,
analyze and present data tables, graphs, statistical
analyses and to perform mathematical modeling.
In 1982, a free database called GenBank was set up to
store DNA sequence data. This database, at the National
Center for Biotechnology Information (NCBI), currently
holds about 17 billion bases from more than 100,000
organisms.
In the late 1980s Intelli-Genetics of Mountain View,
CA, developed bioinformatics software called PC/GENE.
This program was capable of translating gene sequences
to proteins, predicting protein secondary structures
and comparing information contained in different databases.
The early 1990s witnessed an explosion of databases
and bioinformatics software developed with funding from
government agencies and large private corporations.
In 1991, for example, Amos Bairoch introduced the first
version of SWISS-PROT, a protein sequence database.
Currently, SWISS-PROT is a curated protein database
under the ExPASy
(Expert Protein Analysis System) proteomics server of
the Swiss Institute of Bioinformatics (SIB). It contains
over 100,000 protein sequences from close to 7,500 different
species of organisms; the top three most represented
species being humans, mice and yeast.
Currently, bioinformatics receives massive support from
the U.S. National Institutes of Health (NIH). NIH supports
the NCBI whose goal it is to develop new information
technologies that will help life scientists in the study
of molecular and genetic processes.
At the NCBI website, researchers can access many databases
such as GenBank (a DNA sequence database), PubMed (a
literature database) and GenPept (a protein database).
They can also find tools for data searching and analysis.
Entrez, for example, is a system that allows for retrieval
of information from the databases found at NCBI. Other
tools include those that allow researchers to analyze
protein sequences, compare several DNA or protein sequences
(e.g. BLAST) or to view the 3-D structure of proteins. |
| TABLE
I - Major
advances in molecular biology
TABLE
II -
Timeline of development of information databases and
bioinformatic companies |
|
Bioinformatics and Drug Discovery
Major
pharmaceutical companies are also keeping up with the new field
of bioinformatics. Some have opened specialized bioinformatics research
units to help in their drug development process. As Mark Swindells
and Richard Fagan pointed out in their June 2001 Chemical Innovation
article Target Discovery Using Bioinformatics," success
in the race to discover targets in the post-genomic world will go
not simply to the companies with the greatest repertoire of privately
held sequences, but to the companies with the greatest ability to
mine the value locked in the burgeoning data archives."
Bioinformatics
can, for example, be used to determine whether a drug target in
a pathological bacterium is also present in humans. This will help
researchers predict the drug's potential side effects. It might
also allow them to decide early in the drug development process
whether to abandon or continue with research on that drug, thus
saving precious time and money. Moreover, bioinformatics can aid
in the prediction of protein structures and functions (based on
homology to known proteins) to determine the potential of a protein
to be a drug target.
To
keep up with the demand of large pharmaceuticals companies, smaller
companies such as Accelerys (San Diego, CA), LION Bioscience AG
(Heidelberg, Germany), and Incyte Genomics (Palo Alto, CA) are offering
access to information services and data analysis software. Meanwhile,
other companies have opted to form strategic alliances with the
large pharmaceuticals; for example, Rosetta Inpharmatics, a developer
of bioinformatics software, has now become a subsidiary of Merck
& Co., Inc., located in Kirkland, WA.
Current Problems in Bioinformatics
The
generation of biological information has increased with unprecedented
speed in the past two decades. This has been matched by the rapid
development of supercomputer power and has been backed with large
monetary investments. But one thing has not kept up with this growth.
Presently, there is a lack of trained personnel in this interdisciplinary
field. This could be one of the major limitations to the future
expansion of bioinformatics. There are still very few programs in
bioinformatics at major universities around the world. Even those
schools offering it face the difficulty of finding and keeping individuals
who possess the required expertise to teach, since biotech and pharmaceutical
companies can offer more attractive salaries and benefits.
According
to some companies, such as Rosetta Inpharmatics, and Abgenix, an
ideal bioinformatician should have a solid background in biology
and be very comfortable with UNIX and programming languages such
as C and Perl. However, the generalized shortage of bioinformaticians
has forced companies to hire computer scientists or mathematicians
and teach them about biology, or hire biologists who have some self-taught
computer skills.
One
of the major limitations to the future expansion of bioinformatics...
a lack of trained personnel in this interdisciplinary field.
|
|
The
problem with this, however, as some critics point out, is that biologists
often don't have very strong statistical and/or programming training.
The computer scientists, on the other hand, often don't understand
what the really meaningful biological questions are when creating
algorithms.
Fortunately,
this problem has been taken seriously and several universities in
the United States are incorporating bioinformatics courses in their
undergraduate curricula or establishing institutes of bioinformatics.
The University of California at Davis, for example, is putting $95
million into a new bioinformatics program. The Virginia Polytechnic
Institute will invest $100 million for its new Virginia Bioinformatics
Institute which will inhabit three buildings. Other universities
such as the University of Florida at Gainesville, the University
of Sciences in Philadelphia and George Mason University in Virginia
have also formed bioinformatics departments on their campuses. It
is hoped that these programs will provide life sciences students
with a solid background in experimental biology as well as a good
understanding of the computer tools available for the investigation
of biological questions.
Conclusions
DNA
sequences generated by the human genome project, protein structures,
gene expression patterns, etc. carry with them an enormous amount
of valuable information. To uncover all the information, however,
will still take many more years of research. Bioinformatics, as
we have seen, is set to become an indispensable tool. This is not
to say that wet-bench work is not required; on the contrary, bioinformatics
will give researchers the tools necessary to guide their bench work
and handle computers as easily as they do the microscope.
Suggested Reading
Baxevanis,
A.D. and Ouelletee B.F., ed. Bioinformatics: A practical Guide
to the Analysis of Genes and Proteins, 2nd ed. New York: Wiley
InterScience, 2001.
"Bioinformatics." Nature Biotechnology. 2000, 18(supplement):
IT31-IT34. Reprint of Persidis, A. "Bioinformatics." Nature Biotechnology.
1999, 17: 828.830.
Bogurski, M., November 1998. "Bioinformatics: a new era." Immunology
Today. 1998, 19(supplement).
Butler, D. "Are you ready for the revolution?" Nature. 2001,
409: 758-760.
Fagan, Richard and Swindells, Mark. Target Discovery using bioinformatics.
Chemical Innovation vol.31(6) p24-28.
Jonietz E. "Boom amidst bust. Computing plus biology equals big
money." Technology Review. 2001, 104(7):29.
Ricadela, A. IT at the edge of science. Information Week.
2001, 850: 30-37.
Roos, D. "Bioinformatics: Trying to swim in a sea of data." Science.
2001, 291(5507): 1260-66.
Stone, B., et.al. "How powerful new computers are helping researchers
revolutionize drug development." Newsweek. 2001, 137(18):
54.
Web sites related
to this topic
Degree
Programs in Bioinformatics and Computational Biology (Australia,
Europe and North America)
Bioinformatics
milestones
Bio Source Link
Journal of Young
Investigators. 2002. Volume Six.
Copyright © 2002 by Cristina Tang and JYI. All rights reserved.
|
|