Author: Srivastava Siddharth
Institution: Biochemistry and Mathematics
Date: September 2005

Introduction

In 1994, Leonard M. Adleman, a professor at the University of Southern California, created a storm of excitement in the computing world when he announced that he had solved a famous computation problem. There was nothing remarkable about the problem itself, which dealt with finding the shortest route through a series of points. Nor was there anything special about how long it took Adleman to solve it , seven days , substantially greater than the few minutes it would take an average person to find a solution. What was exciting about Adleman's achievement was that he had solved the problem using nothing but deoxyribonucleic acid (DNA) and molecular chemistry.

Adleman's DNA "computer" had solved a so-called directed Hamiltonian Path problem, otherwise known as the traveling salesman problem. In a Hamiltonian Path problem, a series of towns are connected to each other by a fixed number of bridges. A hypothetical salesman has to find the shortest path through the set of towns so that he visits each town once before arriving at his final destination. When the number of cities is small, the question can be tackled analytically by figuring out all possible combinations for itineraries and then choosing the shortest path. As the number of cities grows, the problem generates too many possible paths for brute force solving, so a computer is needed to solve it. However, even with a computer, a Hamiltonian Path problem can easily become too complicated to solve.

Although the solution to Adleman's seven-city Hamiltonian Path problem was relatively straightforward (since all possible routes can be written by hand in a reasonable amount of time), his experiment showed that DNA could be useful as a computational tool. The combination of computer science and molecular biology holds promise for many exciting, interdisciplinary applications in the future.

Exciting Possibilities

"It was the first time that somebody had actually shown how you could use DNA to carry out computational procedures," said Dr. Carter Bancroft, professor and interim chair of the Department of Physiology and Biophysics at the Mount Sinai School of Medicine. With Adleman's experiment, the field of DNA computing was born, and along with it came possibilities for all sorts of applications , from breaking encryption standards to constructing enzymes and other molecules for use in biomolecular engineering or medicine. "[Adleman's work] opened the idea of using DNA to develop interesting and useful non-biological technologies," Bancroft said. "It made a lot of people think, What we can do with DNA?'"

The advent of DNA computing also opened doors for collaboration among computer scientists, chemists, biologists, and mathematicians. "It's been extremely exciting for me," said Bancroft. "I work in a larger area of DNA-based technologies. One of the really fun things about it is its interdisciplinary nature. I have gotten to know a number of computer scientists. I have gotten to know problems in computer science." With the arrival of Adleman's experiment, computer scientists and biologists now have the opportunity to study and conduct research in fields completely different from their own. Such collaborative efforts broaden the scope of research in these fields and lead to new insights and perspectives that otherwise would not be discovered. Bancroft considers this melding of disciplines one of the exciting promises of DNA computing. "It's nice to see that computer scientists are getting to know a lot about DNA, and that molecular biologists are getting to know a lot about computer science."

The Adleman Experiment

How exactly did Adleman go about solving his traveling salesman problem? A simplified version of the Hamiltonian Path problem Adleman solved involves the following scenario:

A Boston airline passenger wants to visit San Diego, Atlanta, and St. Louis before arriving in New York. The airline restricts certain connections between cities, so that the passenger can fly from Boston to San Diego, but not from St. Louis to San Diego. The passenger can visit each city only once. What is the best route the passenger can take? (See Figure 1 for the answer).

Figure 1. A sample traveling salesman problem involving the shortest path connecting all cities. Arrows indicate the direction that someone can travel. For example, a voyager can leave Atlanta and arrive in St. Louis, and vice versa.

Adleman represented each of the cities as single-stranded DNA molecules, each with 20 arbitrary nucleotides. Nucleotides are the building blocks of DNA, and they exist as four different bases: adenine, thymine, guanine, and cytosine (denoted by the letters A, T, G, and C, respectively). For example,

San Diego = TTGACGAATG ATGCTAGAAA,

Atlanta = AATCCATGCG AAATTAGCCC,

St. Louis = TATGACCTAG CTAGCATAGC, etc.

Adleman then assigned flight names for all the possible flight paths, combining the last 10 nucleotides of the departure city with the first ten nucleotides of the arrival city.

For example, a flight leaving St. Louis and arriving in Atlanta would be denoted as CTAGCATAGC AATCCATGCG. If this encounters the complement of the Atlanta city name (TTAGGTACGC TTTAATCGGG) in solution, the segment coding AATCCATGCG will hydrogen bond to the segment coding TTAGGTACGC, leading to

CTAGCATAGC AATCCATGCG

          ||||||||||

          TTAGGTACGC TTTAATCGGG

This structure can in turn bond to a flight leaving from Atlanta, and so on.

In his experiment, Adleman first synthesized DNA strands representing all cities and flight connections. Using DNA ligase, which "glues" DNA molecules together, he then allowed the DNA strands to combine and form all possible itineraries. Adleman made sure there were enough copies of each DNA molecule to ensure that all flight plans could form.

Adleman then used polymerase chain reaction (PCR) to make multiple copies of only those itineraries with the correct departure and arrival destination. PCR is a technique used to repeatedly replicate a particular section of DNA. It works by taking advantage of the enzyme polymerase's ability to copy a section of DNA complementary to short strands of DNA known as primers. PCR will create new copies of the section of DNA surrounded by the primers. If the primers are complementary to sequences for Boston (the departure city) and New York (the final destination), the result of PCR is the amplification of Boston-to-New York itineraries.

After using PCR, Adleman sorted the results based on their sizes with gel electrophoresis. In gel electrophoresis, an electric field forces DNA molecules to travel through a gel medium. Depending on their sizes, DNA fragments may move at a slower or faster pace, providing the basis for sorting the molecules by size. Gel electrophoresis allowed Adleman to determine which itineraries were correct in length. In the example presented here, there are a maximum number of five cities, so the DNA molecules of correct size would be those with the corresponding number of nucleotides.

In the last stage of his experiment, Adleman used a process known as affinity purification. In affinity purification, the complement of a particular section of DNA is tagged with a magnetic substance and then mixed with the DNA of interest. Tagged strands that are bonded to their complements can then be culled out of solution. City by city, Adleman performed affinity purification on all of the itineraries, discarding the molecules not selected after each attempt. The molecule remaining at the end was the answer to his problem.

Benefits of DNA Computing

Adleman's DNA-based computation showed several clear advantages over traditional computation with computers. Adleman's experiment performed at a rate of 100 teraflops, or 100 trillion floating point operations per second. By comparison, NEC Corporation's Earth Simulator, the world's fastest supercomputer, operates at approximately 36 teraflops. Another distinguishing characteristic of DNA molecules is their unprecedented ability to carry out multiple operations at once. A January 2003 EMBO Reports article commented, "Whereas current technology rests on a highly linear principle of logic, where one computation must be completed before the next can begin, the use of DNA means that an enormous number of calculations can take place simultaneously." For example, to defeat an encryption scheme, a serial (i.e., non-parallel) computer must generate and test each solution one after the other. In a DNA-based experiment designed to tackle the same encryption scheme, the combination of DNA molecules (analogous to the computations done by the computer) occurs simultaneously; different solutions can be generated at once. The result is massive parallel computation.

DNA molecules are highly efficient data structures that hold tremendous amounts of information in very small spaces. Along each DNA molecule, nucleotides are positioned within 3.4 angstroms (3.4 x 10-10 m) of each other, making DNA one of the most efficient data systems known to exist. The data density of DNA molecules approaches 18 Mbits per inch. For comparison, today's high performance computer hard drives can store less than 1/100,000 of this information in the same amount of space.

Another remarkable characteristic of DNA is its built-in error correction. Because of DNA's double-stranded nature, adenine binds exclusively to thymine, and guanine binds exclusively to cytosine. Other combinations are possible, but error-repairing enzymes are constantly on the lookout for such anomalies during DNA replication. The end result is one error for every billion bases copied. Taking all these factors into account, Adleman wrote in his pivotal paper, "the potential of molecular computation is impressive."

Problems

The original allure of DNA computing has slowly dwindled in the past few years as more of its inherent problems became evident. Although the computational power of DNA is impressive, sorting through and filtering out unwanted results takes time and resources. "At this point it is unlikely [to implement a working DNA-based computer] because the manipulations are far too laborious," said Dr. Tom Schneider, a researcher in the Laboratory of Experimental and Computational Biology at the National Cancer Institute. The manipulations "require special pure reagents kept sterile at precise temperatures. [A DNA-based computer] would also be too expensive and the computation is not repeatable because the reagents get used up." Adleman's seven-city Hamiltonian Path experiment, for example, required only a short amount of time for all possible itineraries to form, but it took him nearly a week to sort through all the combinations and find the correct solution.

Furthermore, DNA is extremely resource-intensive. Despite the fact that DNA can hold a trillion times more data than current storage media, the process of repeatedly ligating and splicing DNA strands involves tremendous amounts of the molecule. Solving a Hamiltonian Path problem for two hundred cities would require more than 1027 kg of DNA. Typical molecular biology experiments usually require less than 1 mg of DNA.

The Next Generation

Despite these problems, DNA computing is not at a standstill. "I doubt anytime in the near future people will have DNA-based computers instead of silicon-based computers," said Bancroft, referring to the possibility of using devices based on DNA for regular computer tasks like word processing. Pointing to applications of DNA to general scientific problem solving, he added that "the sky's the limit" for other discoveries in DNA computing.

Figure 2. Molecular realization of an automaton. An input DNA molecule (green/blue) provides both data and fuel for the computation. Software DNA molecules (red/purple) encode program rules, and the restriction enzyme FokI (colored ribbons) function… — Figure 2. Molecular realization of an automaton. An input DNA molecule (green/blue) provides both data and fuel for the computation. Software DNA molecules (red/purple) encode program rules, and the restriction enzyme FokI (colored ribbons) functions as the automaton's hardware. A finite-state automaton that analyzes its input (a list of a's and b's) according to program rules depicted as labeled arrows, ends the computation in S0 if the input has an even number of a's and in S1 otherwise. Source: Weizmann Institute of Science.

One such milestone was achieved in 2001, when a group of researchers at the Weizmann Institute in Israel created a biomolecular automaton, a computing device based on the Turing machine. Developed by British mathematician Alan Turing, a Turing machine is a device consisting of a set of inputs (symbols) and transition rules, which govern how the machine processes those inputs. Depending on the input symbols read in, the transition rules tell the device to modify the current symbol, change its internal state, and move on to the next symbol. The end output of the machine can be the answer to any number of questions related to the input string. In Shapiro's experiment, two DNA molecules were held together, one serving as the input strand and the other serving as the "software." The machine was controlled by the enzyme FokI, the device's "hardware," which cut segments of the input strand and caused energy to be released (Figure 2). This dissipated energy was used to repeat the process until a terminating signal was read from the input molecule. In one microliter of solution, there would be enough of these machines to perform nearly 70 billion operations a second.

"Our computer is programmable, but it's not universal," said Shapiro in an interview with National Geographic. "There are computing tasks it inherently can't do." The machine can answer yes or no to certain questions, but because of its limited memory, it cannot, for instance, count the number of ones in a list of ones and zeros.

The true value of Shapiro's device lies in its applications to biomedical engineering. "Autonomous bio-molecular computers may be able to work as doctors in a cell,' operating inside living cells and sensing anomalies in the host," Shapiro said. "Consulting their programmed medical knowledge, the computers could respond to anomalies by synthesizing and releasing drugs."

Bancroft, too, believes that DNA computing has "very exciting possibilities" in the field of nanotechnology. Inside every cell are a number of molecules, including DNA, that operate as sophisticated machines. By learning how to physically control these molecular devices, researchers will be able to engineer devices more complicated and more efficient than current microelectromechanical systems. These biological micromachines could have a range of applications. "It's possible that DNA-directed machines might have medical uses ultimately for maybe correcting defects in cells," said Bancroft. They could also be used to time-release medications, bolster organ function, or provide medical feedback.

What is the future of molecular computing? Schneider envisions a change in the direction of the current progress. "Many if not all of the current molecular computation techniques, including Shapiro's, are macroscopic in that the computations are being done using test tubes in bulk solution. They rely on molecules shuttling back and forth between different complexes. Molecular computers that function by themselves entirely at the molecular level have yet to be developed," he said. DNA computing is still in its infancy, and there is much more research left to be done. Bancroft, however, remains optimistic: "The whole broad field of DNA-based technologies is exciting, because it's limited only by our imagination right now. I think we're just beginning to see the fruits of the field."

Fear Not Traveling Salesmen, DNA Computing is Here to Save the Day