Author: Tang Cristina
Institution: Biochemistry
Date: October 2002

As late as the early 1990s, biology and related fields required very little experience with computers. Now, however, the vast amount of data on DNA sequences and proteins generated from the Human Genome Project and from labs around the world have made it clear that biologists will have to rely more on computers to organize, store and efficiently make use of these data for analysis. Thus, from the fusion of computer science and biology, a relatively new field was born: bioinformatics, a field that holds promise for speeding up drug discovery but also faces problems as it is integrated into our society.

Bioinformatics has emerged in response to major advances in molecular biology technologies, and has been made possible by the exponential growth of computer technology. This interdisciplinary field bridging biology, math, and computer science has three major components: the organization of data generated from experiments into databases, the development of new algorithms and software, and the use of software for the interpretation and analysis of data. Each of these components contributes to the optimal use of the information generated from experiments.

Example of a database

Flybase illustrates the usefulness of a well-built database of biological information in research. This database contains all the currently available knowledge on the fruit fly, the model experimental organism Drosophila melanogaster. The information found on Flybase includes gene and protein sequences found in fruit flies, protein expression patterns and functions, literature references, as well as a list of researchers working with this organism. This database, which is updated regularly and accessible online, allows researchers to quickly obtain the information they need.

Database Search

Well-built databases are very useful for storing, organizing, and accessing information. However, in order to make sense of the data, researchers must be able to analyze and interpret them. This is where the bioinformatics "tools" -- algorithms developed for data analysis -- come in.

When a researcher sequences a piece of DNA, for example, he/she simply gets a long string of the letters: A, G, C, and T, each representing one of its constituent bases. By comparing this sequence to all of the available sequences in a given database, researchers could determine whether the sequence codes for any gene, or whether it is similar to any known sequence from another organism. However, due to the large amount of DNA sequences present in any database these days, this task may take months, or even years, if it is done manually.

Computers, on the other hand, can perform this task efficiently. One frequently used tool to search for similar sequences in a database is called BLAST (Basic Local Alignment Search Tool). As described in the BLAST Course found on the BLAST Web site, this program compares a user-specified DNA sequence to sequences in a database and outputs the results, starting from the sequences that best matches the input sequence based on its algorithm.

Armed with this and many other similar data analysis software tools, biologists have been able to make many discoveries, such as the identification of gene coding regions on DNA sequences, as well as possible links of genes to certain diseases.

Bioinformatics and Drug Discovery

Major pharmaceutical companies are also keeping up with the new field of bioinformatics. Some have opened specialized bioinformatics research units to help in their drug development process. As Mark Swindells and Richard Fagan pointed out in their June 2001 Chemical Innovation article "Target Discovery Using Bioinformatics," "success in the race to discover targets in the post-genomic world will go not simply to the companies with the greatest repertoire of privately held sequences, but to the companies with the greatest ability to mine the value locked in the burgeoning data archives."

Bioinformatics can, for example, be used to determine whether a drug target in a pathological bacterium is also present in humans. This will help researchers predict the drug's potential side effects. It might also allow them to decide early in the drug development process whether to abandon or continue with research on that drug, thus saving precious time and money. Moreover, bioinformatics can aid in the prediction of protein structures and functions (based on homology to known proteins) to determine the potential of a protein to be a drug target.

To keep up with the demand of large pharmaceuticals companies, smaller companies such as Accelerys (San Diego, CA), LION Bioscience AG (Heidelberg, Germany), and Incyte Genomics (Palo Alto, CA) are offering access to information services and data analysis software. Meanwhile, other companies have opted to form strategic alliances with the large pharmaceuticals; for example, Rosetta Inpharmatics, a developer of bioinformatics software, has now become a subsidiary of Merck & Co., Inc., located in Kirkland, WA.

Current Problems in Bioinformatics

The generation of biological information has increased with unprecedented speed in the past two decades. This has been matched by the rapid development of supercomputer power and has been backed with large monetary investments. But one thing has not kept up with this growth.

Presently, there is a lack of trained personnel in this interdisciplinary field. This could be one of the major limitations to the future expansion of bioinformatics. There are still very few programs in bioinformatics at major universities around the world. Even those schools offering it face the difficulty of finding and keeping individuals who possess the required expertise to teach, since biotech and pharmaceutical companies can offer more attractive salaries and benefits.

According to some companies, such as Rosetta Inpharmatics, and Abgenix, an ideal bioinformatician should have a solid background in biology and be very comfortable with UNIX and programming languages such as C and Perl. However, the generalized shortage of bioinformaticians has forced companies to hire computer scientists or mathematicians and teach them about biology, or hire biologists who have some self-taught computer skills.

The problem with this, however, as some critics point out, is that biologists often don't have very strong statistical and/or programming training. The computer scientists, on the other hand, often don't understand what the really meaningful biological questions are when creating algorithms.

Fortunately, this problem has been taken seriously and several universities in the United States are incorporating bioinformatics courses in their undergraduate curricula or establishing institutes of bioinformatics.

The University of California at Davis, for example, is putting $95 million into a new bioinformatics program. The Virginia Polytechnic Institute will invest $100 million for its new Virginia Bioinformatics Institute which will inhabit three buildings. Other universities such as the University of Florida at Gainesville, the University of Sciences in Philadelphia and George Mason University in Virginia have also formed bioinformatics departments on their campuses. It is hoped that these programs will provide life sciences students with a solid background in experimental biology as well as a good understanding of the computer tools available for the investigation of biological questions.

Conclusions

DNA sequences generated by the human genome project, protein structures, gene expression patterns, etc. carry with them an enormous amount of valuable information. To uncover all the information, however, will still take many more years of research. Bioinformatics, as we have seen, is set to become an indispensable tool. This is not to say that wet-bench work is not required; on the contrary, bioinformatics will give researchers the tools necessary to guide their bench work and handle computers as easily as they do the microscope.

Bioinformatics: Life Science Research In Silico

Example of a database

Database Search

Bioinformatics and Drug Discovery

Current Problems in Bioinformatics

Conclusions

Suggested Reading

Stay up-to-date on news and publications: