|
|
Issue 2, September 2003
Fear Not Traveling Salesmen, DNA Computing is Here to Save the Day
Siddharth Srivastava
Biochemistry and Mathematics, Columbia University
srivastava@jyi.org
Discuss this article!
Introduction
In 1994, Leonard M. Adleman, a professor
at the University of Southern California, created a storm of excitement
in the computing world when he announced that he had solved a famous
computation problem. There was nothing remarkable about the problem
itself, which dealt with finding the shortest route through a series
of points. Nor was there anything special about how long it took
Adleman to solve it — seven days — substantially greater than the
few minutes it would take an average person to find a solution.
What was exciting about Adleman’s achievement was that he had solved
the problem using nothing but deoxyribonucleic acid (DNA) and molecular
chemistry.
Adleman’s DNA “computer” had solved
a so-called directed Hamiltonian Path problem, otherwise known as
the traveling salesman problem. In a Hamiltonian Path problem, a
series of towns are connected to each other by a fixed number of
bridges. A hypothetical salesman has to find the shortest path through
the set of towns so that he visits each town once before arriving
at his final destination. When the number of cities is small, the
question can be tackled analytically by figuring out all possible
combinations for itineraries and then choosing the shortest path.
As the number of cities grows, the problem generates too many possible
paths for brute force solving, so a computer is needed to solve
it. However, even with a computer, a Hamiltonian Path problem can
easily become too complicated to solve.
Although the solution to Adleman’s
seven-city Hamiltonian Path problem was relatively straightforward
(since all possible routes can be written by hand in a reasonable
amount of time), his experiment showed that DNA could be useful
as a computational tool. The combination of computer science and
molecular biology holds promise for many exciting, interdisciplinary
applications in the future.
Exciting
Possibilities
“It was the first time that somebody
had actually shown how you could use DNA to carry out computational
procedures,” said Dr. Carter Bancroft, professor and interim chair
of the Department of Physiology and Biophysics at the Mount Sinai
School of Medicine. With Adleman’s experiment, the field of DNA
computing was born, and along with it came possibilities for all
sorts of applications — from breaking encryption standards to constructing
enzymes and other molecules for use in biomolecular engineering
or medicine. “[Adleman’s work] opened the idea of using DNA to develop
interesting and useful non-biological technologies,” Bancroft said.
“It made a lot of people think, ‘What we can do with DNA?’”
The advent of DNA computing also opened
doors for collaboration among computer scientists, chemists, biologists,
and mathematicians. “It’s been extremely exciting for me,” said
Bancroft. “I work in a larger area of DNA-based technologies. One
of the really fun things about it is its interdisciplinary nature.
I have gotten to know a number of computer scientists. I have gotten
to know problems in computer science.” With the arrival of Adleman’s
experiment, computer scientists and biologists now have the opportunity
to study and conduct research in fields completely different from
their own. Such collaborative efforts broaden the scope of research
in these fields and lead to new insights and perspectives that otherwise
would not be discovered. Bancroft considers this melding of disciplines
one of the exciting promises of DNA computing. “It’s nice to see
that computer scientists are getting to know a lot about DNA, and
that molecular biologists are getting to know a lot about computer
science.”
The
Adleman Experiment
How exactly did Adleman go about solving
his traveling salesman problem? A simplified version of the Hamiltonian
Path problem Adleman solved involves the following scenario:
A Boston airline passenger wants to
visit San Diego, Atlanta, and St. Louis before arriving in New York.
The airline restricts certain connections between cities, so that
the passenger can fly from Boston to San Diego, but not from St.
Louis to San Diego. The passenger can visit each city only once.
What is the best route the passenger can take? (See Figure 1 for
the answer).
 |
| Figure
1. A sample traveling salesman
problem involving the shortest path connecting all cities. Arrows
indicate the direction that someone can travel. For example,
a voyager can leave Atlanta and arrive in St. Louis, and vice
versa. |
Adleman represented each of the cities
as single-stranded DNA molecules, each with 20 arbitrary nucleotides.
Nucleotides are the building blocks of DNA, and they exist as four
different bases: adenine, thymine, guanine, and cytosine (denoted
by the letters A, T, G, and C, respectively). For example,
San Diego = TTGACGAATG
ATGCTAGAAA,
Atlanta = AATCCATGCG
AAATTAGCCC,
St. Louis = TATGACCTAG
CTAGCATAGC, etc.
Adleman then assigned
flight names for all the possible flight paths, combining the last
10 nucleotides of the departure city with the first ten nucleotides
of the arrival city.
For example, a flight
leaving St. Louis and arriving in Atlanta would be denoted as
CTAGCATAGC AATCCATGCG.
If this encounters the complement of the Atlanta
city name (TTAGGTACGC
TTTAATCGGG) in
solution, the segment coding AATCCATGCG
will hydrogen bond to the segment coding TTAGGTACGC,
leading to
CTAGCATAGC
AATCCATGCG
||||||||||
TTAGGTACGC TTTAATCGGG
This structure
can in turn bond to a flight leaving from Atlanta, and so on.
In his experiment,
Adleman first synthesized DNA strands representing all cities and
flight connections. Using DNA ligase, which “glues” DNA molecules
together, he then allowed the DNA strands to combine and form all
possible itineraries. Adleman made sure there were enough copies
of each DNA molecule to ensure that all flight plans could form.
Adleman then
used polymerase chain reaction (PCR) to make multiple copies of
only those itineraries with the correct departure and arrival destination.
PCR is a technique used to repeatedly replicate a particular section
of DNA. It works by taking advantage of the enzyme polymerase’s
ability to copy a section of DNA complementary to short strands
of DNA known as primers. PCR will create new copies of the section
of DNA surrounded by the primers. If the primers are complementary
to sequences for Boston (the departure city) and New York (the final
destination), the result of PCR is the amplification of Boston-to-New
York itineraries.
After using
PCR, Adleman sorted the results based on their sizes with gel electrophoresis.
In gel electrophoresis, an electric field forces DNA molecules to
travel through a gel medium. Depending on their sizes, DNA fragments
may move at a slower or faster pace, providing the basis for sorting
the molecules by size. Gel electrophoresis allowed Adleman to determine
which itineraries were correct in length. In the example presented
here, there are a maximum number of five cities, so the DNA molecules
of correct size would be those with the corresponding number of
nucleotides.
In
the last stage of his experiment, Adleman used a process known as
affinity purification. In affinity purification, the complement
of a particular section of DNA is tagged with a magnetic substance
and then mixed with the DNA of interest. Tagged strands that are
bonded to their complements can then be culled out of solution.
City by city, Adleman performed affinity purification on all of
the itineraries, discarding the molecules not selected after each
attempt. The molecule remaining at the end was the answer to his
problem.
Benefits of DNA Computing
Adleman’s DNA-based computation showed
several clear advantages over traditional computation with computers.
Adleman’s experiment performed at a rate of 100 teraflops, or 100
trillion floating point operations per second. By comparison, NEC
Corporation’s Earth Simulator, the world’s fastest supercomputer,
operates at approximately 36 teraflops. Another distinguishing characteristic
of DNA molecules is their unprecedented ability to carry out multiple
operations at once. A January 2003 EMBO Reports article commented,
"Whereas current technology rests on a highly linear principle
of logic, where one computation must be completed before the next
can begin, the use of DNA means that an enormous number of calculations
can take place simultaneously." For example, to defeat an encryption
scheme, a serial (i.e., non-parallel) computer must generate and
test each solution one after the other. In a DNA-based experiment
designed to tackle the same encryption scheme, the combination of
DNA molecules (analogous to the computations done by the computer)
occurs simultaneously; different solutions can be generated at once.
The result is massive parallel computation.
DNA molecules are highly efficient
data structures that hold tremendous amounts of information in very
small spaces. Along each DNA molecule, nucleotides are positioned
within 3.4 angstroms (3.4 x 10-10 m) of each other, making
DNA one of the most efficient data systems known to exist. The data
density of DNA molecules approaches 18 Mbits per inch. For comparison,
today’s high performance computer hard drives can store less than
1/100,000 of this information in the same amount of space.
Another remarkable characteristic of
DNA is its built-in error correction. Because of DNA’s double-stranded
nature, adenine binds exclusively to thymine, and guanine binds
exclusively to cytosine. Other combinations are possible, but error-repairing
enzymes are constantly on the lookout for such anomalies during
DNA replication. The end result is one error for every billion bases
copied. Taking all these factors into account, Adleman wrote in
his pivotal paper, "the potential of molecular computation
is impressive."
Problems
The original allure of DNA computing
has slowly dwindled in the past few years as more of its inherent
problems became evident. Although the computational power of DNA
is impressive, sorting through and filtering out unwanted results
takes time and resources. "At this point it is unlikely [to
implement a working DNA-based computer] because the manipulations
are far too laborious," said Dr. Tom Schneider, a researcher
in the Laboratory of Experimental and Computational Biology at the
National Cancer Institute. The manipulations "require special
pure reagents kept sterile at precise temperatures. [A DNA-based
computer] would also be too expensive and the computation is not
repeatable because the reagents get used up." Adleman’s seven-city
Hamiltonian Path experiment, for example, required only a short
amount of time for all possible itineraries to form, but it took
him nearly a week to sort through all the combinations and find
the correct solution.
Furthermore, DNA is extremely resource-intensive.
Despite the fact that DNA can hold a trillion times more data than
current storage media, the process of repeatedly ligating and splicing
DNA strands involves tremendous amounts of the molecule. Solving
a Hamiltonian Path problem for two hundred cities would require
more than 1027 kg of DNA. Typical molecular biology experiments
usually require less than 1 mg of DNA.
The
Next Generation
 |
| Figure
2. Molecular realization of an automaton. An input
DNA molecule (green/blue) provides both data and fuel for the
computation. Software DNA molecules (red/purple) encode program
rules, and the restriction enzyme FokI (colored ribbons)
functions as the automaton’s hardware. A
finite-state automaton that analyzes its input (a list of
a’s and b’s) according to program rules depicted as labeled
arrows, ends the computation in S0 if the input has an even
number of a’s and in S1 otherwise. Source: Weizmann
Institute of Science. |
Despite these problems, DNA computing
is not at a standstill. "I doubt anytime in the near future
people will have DNA-based computers instead of silicon-based computers,"
said Bancroft, referring to the possibility of using devices based
on DNA for regular computer tasks like word processing. Pointing
to applications of DNA to general scientific problem solving, he
added that "the sky’s the limit" for other discoveries
in DNA computing.
One such milestone was achieved in
2001, when a group of researchers at the Weizmann Institute in Israel
created a biomolecular automaton, a computing device based on the
Turing machine. Developed by British mathematician Alan Turing,
a Turing machine is a device consisting of a set of inputs (symbols)
and transition rules, which govern how the machine processes those
inputs. Depending on the input symbols read in, the transition rules
tell the device to modify the current symbol, change its internal
state, and move on to the next symbol. The end output of the machine
can be the answer to any number of questions related to the input
string. In Shapiro’s experiment, two DNA molecules were held together,
one serving as the input strand and the other serving as the "software."
The machine was controlled by the enzyme FokI, the device’s
"hardware," which cut segments of the input strand and
caused energy to be released (Figure 2). This dissipated energy
was used to repeat the process until a terminating signal was read
from the input molecule. In one microliter of solution, there would
be enough of these machines to perform nearly 70 billion operations
a second.
"Our computer is programmable,
but it’s not universal," said Shapiro in an interview with
National Geographic. "There are computing tasks it inherently
can’t do." The machine can answer yes or no to certain questions,
but because of its limited memory, it cannot, for instance, count
the number of ones in a list of ones and zeros.
The true value of Shapiro’s device
lies in its applications to biomedical engineering. "Autonomous
bio-molecular computers may be able to work as ‘doctors in a cell,’
operating inside living cells and sensing anomalies in the host,"
Shapiro said. "Consulting their programmed medical knowledge,
the computers could respond to anomalies by synthesizing and releasing
drugs."
Bancroft, too, believes that DNA computing
has "very exciting possibilities" in the field of nanotechnology.
Inside every cell are a number of molecules, including DNA, that
operate as sophisticated machines. By learning how to physically
control these molecular devices, researchers will be able to engineer
devices more complicated and more efficient than current microelectromechanical
systems. These biological micromachines could have a range of applications.
"It’s possible that DNA-directed machines might have medical
uses ultimately for maybe correcting defects in cells," said
Bancroft. They could also be used to time-release medications, bolster
organ function, or provide medical feedback.
What is the future of molecular computing?
Schneider envisions a change in the direction of the current progress.
"Many if not all of the current molecular computation techniques,
including Shapiro’s, are macroscopic in that the computations are
being done using test tubes in bulk solution. They rely on molecules
shuttling back and forth between different complexes. Molecular
computers that function by themselves entirely at the molecular
level have yet to be developed," he said. DNA computing is
still in its infancy, and there is much more research left to be
done. Bancroft, however, remains optimistic: "The whole broad
field of DNA-based technologies is exciting, because it’s limited
only by our imagination right now. I think we’re just beginning
to see the fruits of the field."
Discuss this article!
Suggested Reading
Adleman, L.M. Molecular computation of solutions to combinatorial problems.
Science. 266: 1021-1024 (1994).
Arstechnica. DNA computing. April 2000. Last Accessed: 18 April 2003.
<http://www.arstechnica.com/reviews/2q00/dna/dna-1.html>
Ball, P. DNA codes own error correction. September 2002. Last
Accessed: 18 April 2003. <http://www.nature.com/nsu/020916/020916-4.html>
Benenson, Y., et al. DNA molecule provides a computing machine with both data
and fuel. PNAS. 100: 2191-2196 (2003)
Boneh, D. On the computational power of DNA. Discrete Applied Mathematics.
71: 79-94 (1996)
Hapgood, F. Explanation of molecular computing with DNA. April
1995. Last Accessed: 18 April 2003. <http://www.mitre.org/research/nanotech/hapgood_on_dna.html>
Johnson, R. Time to engineer DNA computers. 3 January 2001.
Last Accessed: 18 April 2003. <http://www.eetimes.com/story/OEG20001221S0032>
Lovgren, Stefan. Computers made from DNA and enzymes. 24 February
2003. Last Accessed: 18 April 2003. <http://news.nationalgeographic.com/news/2003/02/0224_030224_DNAcomputer.html>
Ogihara, M., et al. DNA computing on a chip. Nature. 403: 143-144 (2000)
Parker, J. Computing with DNA. EMBO Reports. 4: 7-10 (2003)
Journal
of Young Investigators. 2003. Volume Eight.
Copyright © 2003 by Siddharth Srivastava and JYI. All rights reserved.
|
|