Chapter 11: The Nature of Computation
Originally published by Henry Holt and Company 1999. Published on KurzweilAI.net May 15, 2003.
HOW
. . . will things that think be developed?
by taking advantage of what nature
already knows how to do
•
through research that maximizes
contact, not constraints
•
with education that happens as it is needed,
rather than in advance
•
by interconnecting systems of people and
things to solve hard problems
Information technology can be thought of as a pyramid. At the top
are the supercomputers, the biggest, fastest machines, bought by
the biggest, richest institutions. Then come the large mainframes
that run companies. After that are the servers for a department,
highpowered workstations for demanding applications, and then the
common PC. As the cost goes down, the number of these things in
use goes up.
Below the PC are emerging devices that promise to be used in still
larger numbers.
A few hundred dollars buys a Personal Digital Assistant, a pocketsized
computer handy for mobile information access. For tens of dollars,
the wristwatch has for many years been the one processor and display
that people carry around with them. It is now acquiring new capabilities,
such as a pager, or a GPS receiver, or the Swatch Access watches
used as electronic skilift tickets. Europe is also leading the
way with smart cards, computers the size of a credit card that are
used much the same way, but that store electronic cash instead of
requiring access to a remote computer. They were first used to eliminate
the need for change to use pay phones, and are now spreading to
all areas of electronic commerce. Finally, for tens of cents come
the RFID chips that are used to tag things with an electronic identity.
These are used in the badges that control access to a workplace,
and in tracking laboratory animals. And that's where it stops. No
one's been able to fabricate, package, and sell chips for less than
about ten cents. While that might not appear to be a severe limit,
think about what it leaves off: just about everything!
One of the most frequent challenges presented to me is finding
a way to compute for pennies. Companies have a need to track the
identity, location, and environment of the things they make, but
in competitive markets they can't afford to add more than a few
cents per item. While it would be nice to be able to walk out of
a supermarket without waiting in line because the groceries come
tagged with chips that can automatically be scanned by the door,
or have your cupboard at home contact the grocery store when it
needs restocking, these things won't happen if the silicon chips
add more than a few cents to the cost of the potato chips. A business
card that can call up a business's Web page would be convenient,
but there wouldn't be a business left if its cards cost more than
a few cents.
This is such an intensely commercial question that it becomes an
interesting scientific one. It's not possible to reach onecent
computers just by cutting corners in the fabrication of conventional
chips; the whole industry has been trying unsuccessfully to do that
for many years. The only way to get there is to be smarter about
taking advantage of what nature already knows how to do.
A clue is provided by the cheapest digital device of all, the shoplifting
tag. This is a onecent onebit remotely read memory device, storing
the status of a product (purchased vs. stolen). Many of these, such
as the white rectangular strip that you find on a compact disk,
are made by a company called Sensormatic. These contain a metallic
glass, a thin sheet of a metal alloy that was cooled so quickly
when it formed that the atoms were frozen in a random arrangement.
It looks like a piece of aluminum foil, but when it is exposed to
an oscillating magnetic field it flexes slightly and then rings
much like a tuning fork. This resulting oscillation is much too
fast to be audible, but it in turn creates an oscillating magnetic
field that can be detected. The portals around the door at a CD
store contain the coils for exciting and detecting these oscillations.
My student Rich Fletcher was working with Sensormatic on squeezing
more functionality into these materials by trimming them to varying
lengths so that they would oscillate at different frequencies, and
sensing the motion of magnets near them by how they shift the frequency.
This let him store a few bits in the tag, and use it as a passive
wireless userinterface controller. By putting the coils in a table
instead of a door, Rich could create a work space that could distinguish
among objects brought in, and detect how they were moved. Doing
that isn't remarkable, but doing it for a penny is. At that cost
this capability could be put into most anything, suggesting that
smart materials might be able to replace dumb chips. Indeed, Rich
has since found many other materials that can be remotely sensed
to determine their identity, track their position, and detect properties
such as pressure and temperature.
This approach rapidly runs into a limit on the amount of information
that can be stored in a tag because there are not enough distinguishable
frequencies. Very little data is contained in the call sign of a
radio station; the interesting part is the signal the station broadcasts.
For a smart material to be able to send out a more complex signal
it needs to be nonlinear. If you hit a tuning fork twice as hard
it will ring twice as loud but still at the same frequency. That's
a linear response. If you hit a person twice as hard they're unlikely
just to shout twice as loud. That property lets you learn more about
the person than the tuning fork.
I set out to find a nonlinearity that we could use to store more
data in smart material tags. To keep the cost down to a penny it
had to be a natural property of the material rather than something
that would require elaborate fabrication. An intriguing candidate
is the interaction between the atoms in a material. Chemists study
these with the technique of nuclear magnetic resonance (NMR). Magnetic
resonance imaging (MRI) uses exactly the same mechanism, minus the
word "nuclear," which might scare patients.
NMR is performed in much the same way as shoplifting tags are detected,
applying an oscillating magnetic field and then listening for the
resulting oscillating field. In the tag the oscillation came from
a mechanical vibration of the material; in NMR the oscillation arises
from the nuclei of the atoms. In a magnetic field these precess
like tiny spinning tops. The rate at which they precess depends
on the orientation of neighboring nuclei, spinning faster if they
point in the same direction for example. By probing these neighboring
interactions with NMR, chemists are able to deduce the structure
of molecules. MRI applies a magnetic field that varies over space,
so that the oscillation frequency of a nucleus becomes a function
of its position.
The more I learned about NMR, the more puzzled I became. It' usually
is performed by putting a tube full of a liquid in a strong magnetic
field. A coil wrapped around the tube sends in an oscillating magnetic
field to make the nuclei precess, and then detects the resulting
magnetic field from the response of the nuclei. The chemists analyze
NMR by using the language of quantum mechanics. Quantum mechanics
describes the bizarre behavior of very small things, not a macroscopic
test tube. Quantum mechanically there is nothing preventing this
book from splitting into ten copies and leaping to the moon, but
the interactions with you and the rest of the world force it to
remain sensibly classical. To understand the paradox of how an apparently
classical test tube can behave quantum mechanically, I started lugging
around a distinctly macroscopic NMR bible that chemists use, a book
by Ernst.
I was not getting anywhere until I was forced to do jury duty.
For months I had been trying to avoid serving, but I ran out of
excuses and had to clear a few days and report to the Cambridge
courthouse. There I was locked in the jury room for a day with nothing
to do but read Ernst. Freed from all my usual distractions, I finally
was able to understand just how elegant the apparently impenetrable
and selfreferential chemists' language is.
In NMR each molecule in the liquid behaves as an independent but
identical quantum system. The experiments treat the liquid as a
single effective molecule that in reality is represented by an enormous
number of redundant copies. All of the classical interactions needed
to hold and interact with the liquid do ruin some of those copies,
but there are so many of them that the quantum mechanical properties
of the rest can persist for many seconds, plenty of time to perform
an experiment. This means that NMR provides a convenient way to
access a nonlinear quantum mechanical system.
Those attributes happen to be exactly what's needed for the future
of highperformance computing. The search for cheap chips was leading
to something much more. We've come to expect that every year or
so our computers will double in speed and capacity. This trend was
first noticed by Gordon Moore of Intel in 1965, and has continued
for the last three decades. It's not a law of nature; it's an observation
about remarkable engineering progress. From the very beginning it
was clear where this was heading. Each year it appears that it will
be hard to continue improving performance, and each year some new
tricks are found to squeeze out the next round. But some time around
2020 or so everything will hit bottom. At the rate of current progress,
the wires will be one atom wide, the memory cells will have one
electron, and the fab plant will cost the GNP of the planet so that
no one can afford to build it anyway. Further improvements cannot
come from the current path of shrinking silicon circuits.
In part, this doesn't matter because most of the current constraints
on using a computer have nothing to do with the speed of the processor,
they come from the user interface, or the power requirement, or
the software. A few of the billion dollars spent on building chip
fabs could profitably be used elsewhere. But in part it does matter,
because there are some hard problems that always demand faster computers.
Finding prime factors, for example, is the mathematical problem
that underpins the cryptography used for electronic commerce. The
difficulty of factoring is what lets you send a credit card number
from a Web browser without worrying about an eavesdropper obtaining
it. A desktop computer can multiply together the factors of a fourhundreddigit
number in an instant, but finding them would take today's fastest
computer the age of the universe. That is because factoring is an
exponential algorithm, meaning that doubling the size of the number
to be factored squares instead of doubles the time needed to do
it. And beyond factoring, it's also true that builders of each generation
of computers have confidently predicted that they were sufficient
for most applications, only to find entirely new uses for new machines
that weren't anticipated by extrapolating the old ones. I don't
feel limited because my desktop computer isn't one thousand times
faster, but that reflects a poverty of my imagination as much as
an observation about what I think I need a computer for.
There aren't many places to turn for alternatives to speed up computing.
Light travels a foot in a billionth of a second, the cycle time
of today's fastest chips. This means that they cannot be clocked
much faster because there isn't time for the signals to go anywhere.
Another option is to keep the clock speed but use more computers
in parallel. Not only would this require an annual doubling of the
number of computers to keep pace with Moore's law, parallelism alone
does not scale very well. Scientists have managed to perform logical
functions with DNA molecules, turning a tube full of DNA into an
enormous number of computers, about 10^{23} of them. That's
a lot. Because they communicate with each other slowly, the easiest
way to use them is like a lottery, preparing all possible answers
to a problem and then checking to see which molecule has the winning
solution. But since the number of possible factors of a cryptographic
key is exponential in the size of the key, this means that 10^{23}
computers could simultaneously check for the factors of just a twentythreedigit
number.
There's only one thing in our universe that is exponentially big,
but is not used for computing: quantum mechanics. A classical bit
can be either a 0 or a 1. A bit governed by the laws of quantum
mechanics can simultaneously be a 0 or a 1. To describe it you need
to specify the probability that you'll find a 0 if you measure it,
and the probability that you'll find a 1. The state of a classical
computer with N bits is specified by giving those bits, but
for a quantum computer it requires giving 2^{N} coefficients,
one for each possible combination of the bits. For just fifty quantum
bits, 2^{50} = 10^{15}, storing more information
than the largest classical computers. Because a quantum computer
could perform all possible calculations at the same time, it might
be the key to building more powerful 'computers.
This question was first asked in the early 1980s by Richard Feynmann
at Caltech, Paul Benioff at Argonne National Laboratory, and David
Deutsch at Oxford University. Feynmann was struck by the recognition
that it takes a classical computer an exponential amount of time
to model a quantum system, because it has to keep track of all of
the possible states of the system in parallel. He wondered if a
computer governed by the laws of quantum mechanics might be able
to efficiently model other quantum systems. Feynmann's conjecture,
recently supported by Seth Lloyd at MIT, had broader implications
than he realized.
The study of quantum computing remained a theoretical curiosity
through the 1980s, pursued by an unusual collection of people who
had to know a great deal about both physics and computer science,
and also be slightly strange. Meetings of this community were a
delight because the people involved were so unusual, and the discussion
so irrelevant to any practical concerns. That began to change in
the early 1990s. A series of results were proved by David Deutsch,
Richard Josza of Plymouth University, and Dan Simon, now at Microsoft,
showing that a quantum computer is more powerful than a classical
computer for a series of increasingly less trivial problems. In
1994 Peter Shor of AT&T was able to use these techniques to
show that a quantum computer could find prime factors in polynomial
instead of exponential time. This made factoring a fourhundreddigit
number almost as easy as multiplying its factors.
These results drew on a second aspect of quantum mechanics that
is even more bizarre than the ability to be in many states simultaneously.
If a quantum bit is in a superposition of 0 and 1, and it interacts
with a second bit, the value of the second bit will depend on the
state of the first. If they're now separated to opposite ends of
the universe and the first bit is measured, it is forced to decide
between being a 0 or a 1. At that instant the value of the second
bit is determined. This appears to be an instantaneous action at
a distance, something that made Einstein very unhappy. It is called
entanglement, and it serves in effect to wire together the bits
in a quantum computer. Without entanglement a quantum computer would
have the same problem as a DNA computer, trying many answers at
the same time and then having to locate the correct one, like trying
to find a needle in a haystack. With entanglement, a single quantum
computer can be certain to solve a factoring problem.
Naturally, the threeletter agencies (NSA, CIA, . . . ) panicked.
Here was a very real threat to information security, coming from
an entirely unexpected quarter. Since by that time the result was
already widely known they couldn't cover it up, but they could try
to keep ahead of the competition. So they started showing up at
meetings, in effect offering a purchase order to anyone who would
build them a quantum computer. They started to relax when no one
serious would take one. In the next year many people realized just
how hard it was going to be to build a practical quantum computer.
It would need to be carefully isolated from the rest of the world
so that it could stay quantum, but would need to be accessible to
the outside world so that the initial conditions could be loaded
in, the computation run forward, and the answer read out. Papers
started to appear saying that quantum computers could not be built.
Then, in 1996, two things happened to scare the threeletter agencies
again. The first was the discovery by Peter Shor, Charles Bennett
of IBM, and Andy Steane of Oxford, that quantum error correction
is possible. Just as classical computers add extra bits to memories
to catch mistakes, they realized that quantum computers can use
extra quantum bits to fix their errors. This means that an imperfect
quantum computer could perform a perfect computation, as long as
it is just good enough to be able to implement the error correction.
Here's where NMR comes in. After my day of jury duty I understood
that it might be possible to solve the paradox of connecting the
classical and quantum worlds with the chemist's trick. Instead of
working with a single computer, if a tube full of quantum computers
in the form of a liquid sample in an NMR apparatus was used, then
all of the external interactions might ruin a few of them, but since
each effective quantum bit would be represented in an enormous number
of molecules it would retain its quantum coherence for a much longer
time. For this idea to work it would have to use natural molecules,
since it's not feasible to individually fabricate that many computers.
So the question then became whether a molecule could compute.
Classically, a digital computer must be able to flip a bit (the
NOT gate), and perform a logical operation on two bits (such as
the AND gate). Any other logical operation can be assembled from
these components. Similarly, a quantum computer must be able to
rotate a quantum bit to any angle between 0 and 1, and must be able
to perform a nonlinear operation between two bits. These are easy
to do with NMR. The precession of a nucleus in a magnetic field
is exactly the kind of rotation that is needed, and the interaction
between the nuclei provides the nonlinearity.
I arrived for a visit at the Institute for Theoretical Physics
in Santa Barbara, armed with this knowledge of how to represent
quantum information in what appears to be a classical system, and
how to apply magnetic fields to manipulate the quantum information.
On the first day I stopped by to see Isaac Chuang (now at IBM),
and emerged from his office a week later. He had just finished working
out how to reduce a quantum computer to a series of equivalent operations.
The operations that he needed and that I knew how to do matched
up perfectly.
By the end of the week we had worked out the details of programming
a molecular quantum computer. We had an eerie feeling that we weren't
really discovering anything, just lifting a curtain to reveal a
beautiful structure that had been there all along waiting for someone
to notice it.
A liquid is in fact that most perfect system that could be built
for quantum computing. It's easy to obtain in huge quantities, the
molecules all come perfectly assembled without any manufacturing
imperfections, and the cloud of electrons around the nucleus and
the rapid tumbling of the molecules in the liquid protect the quantum
coherence of the nuclei. Once we understood the connection between
liquid NMR and quantum computing we didn't even need to do a demonstration
experiment, because the chemists had been doing them for years.
Their molecular studies were showing all of the working elements
of a quantum computer, without their realizing that's what they
were doing.
Richard Feynmann gave a seminal talk in 1959 entitled "There's
Plenty of Room at the Bottom." He foresaw the problems coming in
shrinking computers, and wondered if we could jump to making molecular
computers. This inspired the field of nanotechnology that sought
to do just that. Unfortunately, the study of nanotechnology has
produced thoughtful analyses and compelling visions, but very little
in the way of working machines. It now turns out that ordinary molecules
all along have known how to compute—we just weren't asking
them the right questions.
This was one of those ideas that was ready to be thought. An MIT/Harvard
group found a related way to manipulate quantum information in a
liquid, one at Los Alamos found still another, and many groups around
the world are now using these techniques to make small quantum computers.
The first experimental computation we did that required fewer steps
than on a classical computer was a search problem, using the molecule
chloroform. Given no advance knowledge, unlocking a classical padlock
with N settings requires about N/2 attempts. Lov Grover
at Bell Labs has shown that this can be done in 'N steps
on a quantum computer. Our chloroform computer had four states;
our quantum search was able to find in a single step the answer
that took a few tries classically.
Daunting challenges remain if these demonstration experiments are
ever to grow up to exceed the performance of today's fastest classical
computers, although these are increasingly technological rather
than conceptual problems. The most severe limit is a rapid decrease
in sensitivity for larger molecules, but this might be solved by
using lasers to align the nuclei. The quantum coherence lasts for
thousands of operations, approaching the limit needed to be able
to use quantum error correction. This also needs a technique such
as the optical nuclear alignment, to prepare the extra quantum bits
used in the error correction. Finally, as the size of a molecule
is increased, the interactions between nuclei at the far ends of
a molecule become too weak to use for computing, but Seth Lloyd
has shown that a polymer with a simple repeating atomic pattern
is almost as useful and requires that the nuclei interact only with
their nearest neighbors.
These scaling constraints offer a glimpse of an ideal molecule
for quantum computing. It should have a regular structure, so that
Seth's technique can be used. Better still, the pattern should be
2D or 3D rather than a linear chain, to reduce the time to pass
messages around the molecule. It should have a rigid configuration,
so that all of the local atomic environments are identical, and
it should have a special structure at the end that can be used to
load in data and read out results. This sounds disturbingly like
a microtubule.
Microtubules are a kind of molecular scaffolding in neurons. While
there's no compelling reason to believe that quantum mechanics has
anything to do with consciousness, those who do think that it does
have come to identify microtubules as the seat of quantum consciousness.
I couldn't begin to guess how the brain could perform the kinds
of molecular manipulations that we do in NMR, although it is amusing
to observe that a head in an MRI magnet is a perfectly good medium
in which to perform a quantum computation.
The brain may not be the best place to look for guidance in how
to build a quantum computer, but it does have a valuable lesson
to teach us about how to build more powerful computers. Moore's
law demands exponential improvements in performance. Quantum mechanics
promises one way to do that; analog circuitry offers another.
The neurons in a brain are about a million times slower than the
transistors in a computer, yet they can instantly perform tasks
that tax the fastest computers. There are many more neurons in the
brain than transistors in a computer, about 1011 of them, and even
more connections among them, roughly 1015 synapses. These synapses
are the key.
Many computing problems reduce to finding good ways to approximate
complicated functions. For example, for a computer to recognize
faces it must have a function that can take as an input the pixels
in an image and provide as an output the name associated with the
face. There are many techniques for doing this, but they all reduce
in the end to some kind of function that relates inputs to outputs.
In traditional engineering practice unknown functions are represented
by using some family of basis functions, such as a collection of
polynomial powers or trigonometric functions, combined with a set
of weighting coefficients. There are relatively straightforward
techniques for finding the best coefficients to match a given set
of data.
This isn't how the brain does it. In the brain the coefficients
correspond to the synaptic connection strengths, and the basis functions
are the Sshaped response of the neurons. The brain uses the coefficients
in forming the arguments of the basis functions, rather than using
them just to combine the results. Engineers have historically stayed
away from doing this because it can be shown that there's no longer
a practical procedure to find the best values for the coefficients.
All that can be done is some kind of search that learns reasonable
ones.
The study of neural networks was inspired by the goal of emulating
the organization of the brain. While the brain is still far too
complex to model in any detail, simple mathematical networks that
put the unknown coefficients before the nonlinear basis functions
were surprisingly successful. It is now understood why.
A characteristic feature of difficult recognition problems is that
there may be a huge number of inputs, such as all of the pixels
in an image. In a traditional model, one coefficient is needed for
each possible combination of inputs and basis functions. This rapidly
explodes, so that if ten are needed for one input, one hundred are
needed for two, one thousand for three, and so forth. This is called
the "curse of dimensionality." A neural network model is much more
flexible. The adjustable coefficients inside the nonlinear functions
are far more powerful; the number required is a polynomial function
instead of an exponential function of the number of inputs.
Even so, there's still the problem of finding their values. A second
result comes to the rescue here. It turns out that these models
are so flexible, almost anywhere they start out there is a good
solution nearby. A search procedure does not need to find the single
best solution, just an acceptable one, and there is a huge number
of them. This is ensured by using a large number of neurons, far
more than would be considered sane in a conventional model. This
property has come to be called the "blessing of dimensionality."
It helps explain why learning is so essential to how the brain functions.
So here is the second exponential insight: use models that require
learning the values of coefficients that appear inside nonlinear
functions. While this is not applicable to all computational problems,
it works for just the sorts of things that people do well, such
as recognizing patterns and prioritizing tasks. A digital computer
can simulate this kind of architecture, but it's a misplaced application
of the digital precision. Simpler analog circuits can do the same
thing with far fewer parts.
Analog circuits were used in the early days of computing, but they
fell out of favor because they accumulate errors. Digital computers
can correct their mistakes and hence perform arbitrarily long computations.
However, this is not an essential distinction between analog and
digital systems. Ultimately everything is analog; the apparently
digital values are a consequence of carefully designed analog circuits.
We're now learning how to marry the most desirable features of both
worlds, correcting analog errors while using the analog flexibility
to squeeze much more performance out of a simple physical system.
For example, Geoff Grinstein of IBM and I studied what are called
spreadspectrum coding techniques, which are used to intentionally
make a signal appear to be more random than it is. While this may
appear to be a perverse aim, it turns out that almost every property
of engineered systems is improved when they work with effectively
random signals (more people can share a communications channel with
less interference, more data can be stored on a disk, and so forth).
There is a beautiful theory for how to generate bits for use in
spread spectrum that appear to be completely random but in fact
are entirely predictable. A transmitter uses such a sequence to
spread its signal and the receiver then uses an identical copy to
recover the message (this idea can be traced back to the actress
Hedy Lamarr and composer George Antheil using piano rolls in World
War II). This is what you hear when your modem hisses. The catch
is that if the receiver is remote from the transmitter it has a
difficult computing job to do to figure out the spreading sequence
that the transmitter used. This is part of what a GPS receiver must
do to lock onto a satellite and find its position.
Two copies of almost any analog system, if allowed to interact
in almost any way, have a very interesting property: they synchronize.
This used to be seen in the simultaneous ticking of mechanical clocks
on the wall of a clock shop. Geoff and I wondered if we could use
this property to solve the problem of spreadspectrum acquisition.
Much to our pleasant surprise, we discovered that we could design
simple continuous systems that exactly matched the response of the
familiar digital ones for digital signals, but if they started in
the wrong state they could use their analog freedom to synchronize
onto the digital signal. Instead of splitting the job into an analog
detector and then a digital search program as is done now, a physical
analog part can do everything.
The digital world has gone on refining Turing's machine, when even
Turing understood its limits. One more area that he made a seminal
contribution to was explaining how biology can form patterns by
using chemical reactions. This useful behavior is easy to find in
nature but hard to model on a digital computer. Now that we're approaching
the end of the era of rapid increases in the performance of digital
computers, like Turing we have no choice but to pay more attention
to information processing in the analog world.
When Feynmann said that there's plenty of room at the bottom, he
meant that there was a long way to go to shrink computers down to
atomic sizes. That's no longer true. With difficulty we are approaching
that limit with carefully engineered circuits, and can now glimpse
how to eliminate all that effort by using natural materials. Although
he wasn't thinking about economics, that may carry the most interesting
implication of all of his challenge. There's still plenty of room
at the bottom to make much cheaper and more widely accessible computers.
Nature is not only good at designing computers, it has a few lessons
left to teach us about manufacturing them.
WHEN THINGS START TO THINK by Neil Gershenfeld. ©1998 by
Neil A. Gershenfeld. Reprinted by arrangement with Henry Holt and
Company, LLC.
