Chapter 8: Bad Words
Originally published by Henry Holt and Company 1999. Published on KurzweilAI.net May 15, 2003.
A group from a major computer company once came to the Media Lab
to learn about our current research. The day started with their
explaining that they wanted to see agent technology. Agents are
a fashionable kind of computer program designed to learn users'
preferences and help anticipate and act on their needs. Throughout
the day the visitors saw many relevant things, including software
to assist with collaboration among groups of people, environments
for describing how programs can adapt, and mathematical techniques
for finding patterns in user data. I almost fell out of my chair
at the wrap-up at the end of the day when they asked when they were
going to see agent technology. They had clearly been tasked to acquire
some agents, but wouldn't recognize one if it bit them.
I spent a day consulting with a firm that spends billions of dollars
a year on information technology. After working with them for most
of the day on methods for analyzing and visualizing large data sets
to find hidden structure, a senior executive asked if we could change
subjects and talk about "data mining." He had heard about this great
new technique that separates the nuggets of insight from the chaff
of data that most companies have. Unfortunately, he had no idea
that data mining and data analysis had anything to do with each
other, and that in fact that was what we had been doing all day
long.
At an industry lunch I found myself seated next to an editor for
a major computer magazine. She was writing an article about exciting
recent breakthroughs in artificial intelligence, and asked me if
I used "neural net technology." When I told her that I did use nonlinear
layered models to approximate functions with many variables, she
looked concerned but gave me a second chance and asked if I used
"fuzzy logic technology." After I answered that I did include probabilities
in my models to let me account for advance beliefs and subsequent
observations, she rolled her eyes and made clear what a burden it
was to talk to someone who was so technologically unhip. She was
particularly disappointed to see such old-fashioned thinking in
the Media Lab.
Any software catalog is full of examples of perfectly unremarkable
programs with remarkable descriptions. You can get ordinary games,
or games that take advantage of virtual reality technology; regular
image compression programs, or special ones that use fractal technology;
plain word processors, or new and improved ones with artificial
intelligence technology. The mysterious thing is that, other than
the descriptions, the capabilities of these new programs are surprisingly
similar to those of the old-fashioned ones.
These unfortunate examples share a kind of digitally enhanced semiotic
confusion. I'm always suspicious when the word "technology" is injudiciously
attached to perfectly good concepts that don't need it. Adding "technology"
confers the authority of, say, a great Victorian steam engine on
what might otherwise be a perfectly unremarkable computer program.
If you're offered a better mousetrap, you can inspect it and evaluate
its design improvements. But if you're offered a better computer
program, it is difficult to peer inside and judge how it works.
Consequently, the words that are used to describe software can have
more influence and less meaning than those used to describe a mousetrap.
Because so many people are so unclear about where hardware leaves
off and software begins, reasonable people accept unreasonable claims
for computer programs. A market persists for software that purports
to speed up the processor in a computer, or add memory to a system,
even though those attributes are clearly questions of physical resources.
It doesn't require an advanced testing laboratory to recognize that
such things can't (and don't) work.
These problems are not just a consequence of malicious marketing—even
the developers can be misled by their own labels. When I was a graduate
student at Cornell doing experiments in the basement of the Physics
building, there was a clear sign when someone's thesis research
was in trouble: they made overly nice cases for their instruments.
Creating and understanding new capabilities was difficult; creating
fancy packages for old capabilities was labor-intensive but guaranteed
to succeed. Making an elaborate housing provided the illusion of
progress without the risk of failure because nothing new was actually
being done. Something similar is happening with bad programs wrapped
in elaborate interfaces and descriptions.
There are a number of words that I've found to be good indicators
of such suspect efforts because they are used so routinely in ways
that are innocently or intentionally misleading. Lurking behind
terms such as "multimedia," "virtual reality," "chaos theory," "agents,"
"neural networks," and "fuzzy logic" are powerful ideas with fascinating
histories and remarkable implications, but their typical use is
something less than remarkable. Realization of their promise requires
much more than the casual application of a label.
When Yo-Yo was playing our cello you could correctly call what
he was doing "interactive multimedia" because a computer was making
sounds in response to his inputs, and you might even call it a kind
of "virtual reality" because he was using physical sensors to manipulate
a virtual instrument. But you probably wouldn't do either because
there was a much more compelling reality: Yo-Yo Ma playing the cello.
This is another clue that one should worry about the use of these
kinds of technological buzz phrases. A marketing description of
the feature list of a Stradivarius could not distinguish it from
a lesser instrument, or even from a multimedia PC running a cello
simulator. In fact, the PC would come out much better in a list
of standard features and available options and upgrades. The Strad,
after all, can only make music. The words we usually use to describe
computers have a hard time capturing the profound difference between
things that work, and things that work very well. A Strad differs
from a training violin in lots of apparently inconsequential details
that when taken together make all of the difference in the world.
People are often surprised to find that there's no one in the Media
Lab who would say that they're working on multimedia. When the lab
started fifteen years ago, suggesting that computers should be able
to speak and listen and see was a radical thing to do, so much so
that it was almost grounds for being thrown off campus for not being
serious. Now not only are there conferences and journals on multimedia,
the best work being done has graduated from academic labs and can
be found at your local software store. It was a battle for the last
decade to argue that a digital representation lets computers be
equally adept at handling text, audio, or video. That's been won;
what matters now is what is done with these capabilities.
Studying multimedia in a place like the Media Lab makes as much
sense as studying typewriters in a writers' colony, or electricity
in a Computer Science department. Identifying with the tools deflects
attention away from the applications that should motivate and justify
them, and from the underlying skills that are needed to create them.
The causal ease of invoking "multimedia" to justify any effort that
involves sounds or images and computers is one of the reasons that
there has been so much bad done in the name of multimedia.
When I fly I've found a new reason to look for airsickness bags:
the in-flight videos touting new technology. These hyperactive programs
manage to simulate turbulence without even needing to leave the
ground. They are the video descendents of early desktop publishing.
When it became possible to put ten fonts on a page, people did,
making so much visual noise. Good designers spend years thinking
about how to convey visual information in ways that guide and satisfy
the eye, that make it easy to find desired information and progress
from a big picture to details. It requires no skill to fill a page
with typography; it requires great discipline to put just enough
of the right kind of information in the right place to communicate
the desired message rather than a printer demo. In the same way,
adding audio and video together is easy when you have the right
software and hardware, but to be done well requires the wisdom of
a film director, editor, cinematographer, and composer combined.
Jump cuts look just as bad done with bits or atoms.
Overuse of the possibilities afforded by improving technology has
come full circle in a style of visual design that miraculously manages
to capture in print the essence of bad multimedia. It hides the
content behind a layout of such mind-boggling complexity that, after
I admire the designer's knowledge of the features of their graphics
programs, I put the magazine down and pick up a nice book. Such
cluttered design is certainly not conducive to distinguishing between
what exists and what does not, between what is possible and what
is not. And so here again in this kind of writing the labels attached
to things take on more importance than they should.
After media became multi, reality became virtual. Displays were
put into goggles and sensor gloves were strapped onto hands, allowing
the wearer to move through computer-generated worlds. The problem
with virtual reality is that it is usually taken to mean a place
entirely disconnected from our world. Either you are in a physical
reality, or a virtual one. When most people first actually encounter
virtual reality, they're disappointed because it's not that unlike
familiar video games. Although in part this is a consequence of
describing simple systems with fancy terms, the analogy with video
games goes deeper. In a virtual world there is an environment that
responds to your actions, and the same is true of a video game.
Virtual reality is no more or less than computers with good sensors
and displays running models that can react in real time. That kind
of capability is becoming so routine that it doesn't require a special
name.
One of the defining features of early virtual reality was that
it was completely immersive; the goal was to make video and audio
displays that completely plugged your senses. That's become less
important, because bad virtual reality has reminded people of all
of the good features of the world of atoms. Instead of a sharp distinction
between the virtual and the physical world, researchers are beginning
to merge the best attributes of both by embedding the displays into
glasses or walls. Discussions about virtual reality lead to awkward
constructions like "real reality" to describe that which is not
virtual; it's much more natural to simply think about reality as
something that is presented to you by information in your environment,
both logical and physical.
"Chaos theory" is a leading contender for a new paradigm to describe
the complexity of that physical environment and bring it into the
precise world of computers. I read a recent newspaper article that
described the significance of ". . . chaos theory, which studies
the disorder of formless matter and infinite space." Wow. One can't
ask for much more than that. What makes "chaos theory" particularly
stand out is that it is generally considered to be a theory only
by those who don't work on it.
The modern study of chaos arguably grew out of Ed Lorenz's striking
discovery at MIT in the 1960s of equations that have solutions that
appear to be random. He was using the newly available computers
with graphical displays to study the weather. The equations that
govern it are much too complex to be solved exactly, so he had the
computer find an approximate solution to a simplified model of the
motion of the atmosphere. When he plotted the results he thought
that he had made a mistake, because the graphs looked like random
scribbling. He didn't believe that his equations could be responsible
for such disorder. But, hard as he tried, he couldn't make the results
go away. He eventually concluded that the solution was correct;
the problem was with his expectations. He had found that apparently
innocuous equations can contain solutions of unimaginable complexity.
This raised the striking possibility that weather forecasts are
so bad because it's fundamentally not possible to predict the weather,
rather than because the forecasters are not clever enough.
Like all good scientific discoveries, the seeds of Lorenz's observation
can be found much earlier. Around 1600 Johannes Kepler (a devout
Lutheran) was trying to explain the observations of the orbits of
the planets. His first attempt, inspired by Copernicus, matched
them to the diameters of nested regular polyhedra (a pyramid inside
a cube . . . ). He published this analysis in the Mysterium Cosmographicum,
an elegant and entirely incorrect little book. While he got the
explanation wrong, this book did have all of the elements of modern
science practice, including a serious comparison between theoretical
predictions and experimental observations, and a discussion of the
measurement errors. Armed with this experience plus better data
he got it right a few years later, publishing a set of three laws
that could correctly predict the orbits of the planets. The great
triumph of Newton's theory of gravitation around 1680 was to derive
these laws as consequences of a gravitational force acting between
the planets and the sun. Newton tried to extend the solution to
three bodies (such as the combined earth-moon-sun system), but failed
and concluded that the problem might be too hard to solve. This
was a matter of more than passing curiosity because it was not known
if the solution of the three-body problem was stable, and hence
if there was a chance of the earth spiraling into the sun.
Little progress was made on the problem for many years, until around
1890 the French mathematician Henri Poincar' was able to prove that
Newton's hunch was right. Poincar' showed that it was not possible
to write down a simple solution to the three-body problem. Lacking
computers, he could only suspect what Lorenz was later able to see:
the solutions could not be written down because their behavior over
time was so complex. He also realized that this complexity would
cause the solutions to depend sensitively on any small changes.
A tiny nudge to one planet would cause all of the trajectories to
completely change.
This behavior is familiar in an unstable system, such as a pencil
that is balanced on its point and could fall in any direction. The
converse is a stable system, such as the position of the pendulum
in a grandfather clock that hangs vertically if the clock is not
wound. Poincar' encountered, and Lorenz developed, the insight that
both behaviors could effectively occur at the same time. The atmosphere
is not in danger of falling over like a pencil, but the flutter
of the smallest butterfly can get magnified to eventually change
the weather pattern. This balance between divergence and convergence
we now call chaos.
Ordinary mathematical techniques fail on chaotic systems, which
appear to be random but are governed by simple rules. The discovery
of chaos held out the promise that simple explanations might be
lurking behind nature's apparent complexity. Otherwise eminent scientists
relaxed their usual critical faculties to embrace the idea. The
lure of this possibility led to the development of new methods to
recognize and analyze chaos. The most remarkable of these could
take almost any measured signal, such as the sound of a faucet dripping,
and reveal the behavior of the unseen parts of the system producing
the signal. When applied to a dripping faucet this technique did
in fact provide a beautiful explanation for the patterns in what
had seemed to be just an annoying consequence of leaky washers.
After this result, the hunt was on to search for chaos.
Lo and behold, people found it everywhere. It was in the stars,
in the weather, in the oceans, in the body, in the markets. Computer
programs were used to test for chaos by counting how many variables
were needed to describe the observations; a complex signal that
could be explained by a small number of variables was the hallmark
of chaos. Unfortunately, this method had one annoying feature. When
applied to a data set that was too small or too noisy it could erroneously
conclude that anything was chaotic. This led to excesses such as
the Economist magazine reporting that "Mr. Peters thinks
that the S&P 500 index has 2.33 fractal dimensions." This means
that future values of the stock market could be predicted given
just three previous values, a recipe for instant wealth if it wasn't
so obviously impossible. Such nonsensical conclusions were accepted
on the basis of the apparent authority of these computer programs.
Chaos has come to be associated with the study of anything complex,
but in fact the mathematical techniques are directly applicable
only to simple systems that appear to be complex. There has proved
to be a thin layer between systems that appear to be simple and
really are, and those that appear to be complex and really are.
The people who work on chaos are separating into two groups, one
that studies the exquisite structure in the narrow class of systems
where it does apply, and another that looks to use the methods developed
from the study of chaos to help understand everything else. This
leaves behind the frequently noisy believers in "chaos theory,"
inspired but misled by the exciting labels.
They're matched in enthusiasm by the "agents" camp, proponents
of computer programs that learn user preferences and with some autonomy
act on their behalf. The comforting images associated with an agent
are a traditional English butler, or a favorite pet dog. Both learn
to respond to their master's wishes, even if the master is not aware
of expressing them. Nothing would be more satisfying than a digital
butler that could fetch. Unfortunately, good help is as hard to
find in the digital world as it is in the physical one.
Agents must have a good agent. The widespread coverage of them
has done a great job of articulating the vision of what an agent
should be able to do, but it's been less good at covering
the reality of what agents can do. Whatever you call it,
an agent is still a computer program. To write a good agent program
you need to have reasonable solutions to the interpretation of written
or spoken language and perhaps video recognition so that it can
understand its instructions, routines for searching through large
amounts of data to find the relevant pieces of information, cryptographic
schemes to manage access to personal information, protocols that
allow commerce among agents and the traditional economy, and computer
graphics techniques to help make the results intelligible. These
are hard problems. Wanting to solve them is admirable, but naming
the solution is not the same as obtaining it.
Unlike what my confused industrial visitors thought, an agent is
very much part of this world of algorithms and programming rather
than a superhuman visitor from a new world. The most successful
agents to date have bypassed many of these issues by leveraging
human intelligence to mediate interactions that would not happen
otherwise. Relatively simple programs can look for people who express
similar preferences in some areas such as books or recordings, and
then make recommendations based on their previous choices. That's
a great thing to do, helping realize the promise of the Internet
to build communities and connections among people instead of isolating
them, but then it becomes as much a question of sociology as programming
to understand how, where, and why people respond to each other's
choices.
If an agent is ever to reason with the wisdom of a good butler,
it's natural to look to the butler's brain for insight into how
to do this. Compared to a computer, the brain is made out of slow,
imperfect components, yet it is remarkably powerful, reliable, and
efficient. Unlike a conventional digital computer, it can use continuous
analog values, and it takes advantage of an enormous number of simple
processing elements working in parallel, the neurons. These are
"programmed" by varying the strength of the synaptic connections
among them. "Neural networks" are mathematical models inspired by
the success of this architecture.
In the 1940s mathematical descriptions were developed for neurons
and their connections, suggesting that it might be possible to go
further to understand how networks of neurons function. This agenda
encountered an apparently insurmountable obstacle in 1969 when Marvin
Minsky and Seymour Papert proved that a layer of neurons can implement
only the simplest of functions between their inputs and outputs.
The strength of their result effectively halted progress until the
1980s when a loophole was found by introducing neurons that are
connected only to other neurons, not inputs or outputs. With such
a hidden layer it was shown that a network of neurons could represent
any function.
Mathematical models traditionally have been based on finding the
values of adjustable parameters to most closely match a set of observations.
If a hidden layer is used in a neural network, this is no longer
possible. It can be shown that there's no feasible way to choose
the best values for the connection strengths. This is analogous
to the study of chaos, where the very complexity that makes the
equations impossible to solve makes them valuable to use. The behavior
of chaotic equations can be understood and applied even if it can't
be exactly predicted. Similarly, it turned out that hidden layers
can be used by searching for reasonable weights without trying to
find the best ones. Because the networks are so flexible, even a
less-than-ideal solution can be far more useful than the exact solution
of a less-capable model. As a consequence of this property, neural
networks using surprisingly simple search strategies were surprisingly
capable.
The combination of some early successes and language that suggests
that neural networks work the same way the brain does led to the
misleading impression that the problem of making machines think
had been solved. People still have to think to use a neural network.
The power, and problems, of neural networks were amply demonstrated
in a study that I ran at the Santa Fe Institute. It started with
a workshop that I attended there, exploring new mathematical techniques
for modeling complex systems. The meeting was distressingly anecdotal,
full of sweeping claims for new methods but containing little in
the way of insight into how they fail or how they are related to
what is already known. In exasperation I made a joke and suggested
that we should have a data analysis contest. No one laughed, and
in short order the Santa Fe Institute and NATO had agreed to support
it.
My colleague Andreas Weigend and I selected interesting data sets
from many disciplines, giving the changes over time in a currency
exchange rate, the brightness of a star, the rhythm of a heartbeat,
and so forth. For extra credit we threw in the end of The Art
of the Fugue, the incomplete piece that Bach was writing when
he died. These data were distributed around the world through the
Internet. Researchers were given quantitative questions appropriate
to each domain, such as forecasting future values of the series.
These provided a way to make comparisons across disciplines independent
of the language used to describe any particular technique.
From the responses I learned as much about the sociology of science
as I did about the content. Some people told us that our study was
a mistake because science is already too competitive and it's a
bad influence to try to make these comparisons; others told us that
our study was doomed because it's impossible to make these kinds
of comparisons. Both were saying that their work was not falsifiable.
Still others said that our study was the best thing that had happened
to the field, because they wanted to end the ambiguity of data analysis
and find out which technique was the winner.
One of the problems presented data from a laser that was erratically
fluctuating on and off; the task was to predict how this pattern
would continue after the end of the data set. Because the laser
was chaotic, traditional forecasting methods would not do much better
than guessing randomly. Some of the entires that we received were
astoundingly good. One of the best used a neural network to forecast
the series, and it was convincingly able to predict all of the new
behavior that it had not seen in the training data. Here was a compelling
demonstration of the power of a neural net.
For comparison, one entry was done by eye, simply guessing what
would come next. Not surprisingly, this one did much worse than
the neural network. What was a surprise was that it beat some of
other entries by just as large a margin. One team spent hours of
supercomputer time developing another neural network model; it performed
significantly worse than the visual inspection that took just a
few moments. The best and the worse neural networks had similar
architectures. Nothing about their descriptions would indicate the
enormous difference in their performance; that was a consequence
of the insight with which the networks were applied.
It should not be too unexpected that apparently similar neural
networks can behave so differently—the same is true of real
brains. As remarkable as human cognition can be, some people are
more insightful than others. Starting with the same hardware, they
differ in how they use it. Using a neural network gives machines
the same opportunity to make mistakes that people have always enjoyed.
Many experts in neural networks don't even study neural networks
anymore. Neural nets provided the early lessons that started them
down the path of using flexible models that learn, but overcoming
the liabilities of neural nets has led them to leave behind any
presumption of modeling the brain and focus directly on understanding
the mathematics of reasoning. The essence of this lies in finding
better ways to manage the tension between experience and beliefs.
One new technique on offer to do that is "fuzzy logic." It is sold
as an entirely new kind of reasoning that replaces the sharp binary
decisions forced by our Western style of thinking with a more Eastern
sense of shades of meaning that can better handle the ambiguity
of the real world. In defense of the West, we've known for a few
centuries how to use probability theory to represent uncertainty.
If you force a fuzzy logician to write down the expressions they
use, instead of telling you the words they attach to them, it turns
out that the expressions are familiar ones with new names for the
terms. That itself is not so bad, but what is bad is that such naming
deflects attention away from the much better developed study of
probability theory.
This danger was on display at a conference I attended on the mathematics
of inference. A battle was raging there between the fuzzy logic
camp and the old-fashioned probabilists. After repeatedly pushing
the fuzzy logicians to show anything at all that they could do that
could not be done with regular probability theory, the fuzzy side
pulled out their trump card and told the story of a Japanese helicopter
controller that didn't work until fuzzy logic was used. Everyone
went home feeling that they had won the argument. Of course a demonstration
that something does work is far from proof that it works better
than, or even differently than, anything else. What's unfortunate
about this example is that perfectly intelligent people were swayed
by the label attached to the program used in the helicopter rather
than doing the homework needed to understand how it works and how
it relates to what is already known. If they had done so, they might
have discovered that the nonfuzzy world has gone on learning ever
more interesting and useful things about uncertainty.
Connecting all of these examples is a belief in magic software
bullets, bits of code that can solve the hard problems that had
stumped the experts who didn't know about neural networks, or chaos,
or agents. It's all too easy to defer thinking to a seductive computer
program. This happens on the biggest scales. At still one more conference
on mathematical modeling I sat through a long presentation by someone
from the Defense Department on how they are spending billions of
dollars a year on developing mathematical models to help them fight
wars. He described an elaborate taxonomy of models of models of
models. Puzzled, at the end of it I hazarded to put up my hand and
ask a question that I thought would show everyone in the room that
I had slept through part of the talk (which I had). I wondered whether
he had any idea whether his billion-dollar models worked, since
it's not convenient to fight world wars to test them. His answer,
roughly translated, was to shrug and say that that's such a hard
question they don't worry about it. Meanwhile, the mathematicians
in the former Soviet Union labored with limited access to computers
and had no recourse but to think. As a result a surprising fraction
of modern mathematical theory came from the Soviet Union, far out
of proportion to its other technical contributions.
Where once we saw farther by standing on the shoulders of our predecessors,
in far too many cases we now see less by standing on our predecessors'
toes. You can't be interdisciplinary without the disciplines, and
without discipline. Each of the problematical terms I've discussed
is associated with a very good idea. In the life cycle of an idea,
there is a time to tenderly nurse the underlying spark of new insight,
and a time for it to grow up and face hard questions about how it
relates to what is already known, how it generalizes, and how it
can be used. Even if there aren't good answers, these ideas can
still be valuable for how they influence people to work on problems
that do lead to answers. The Information Age is now of an age that
deserves the same kind of healthy skepticism applied to the world
of bits that we routinely expect in the world of atoms.
WHEN THINGS START TO THINK by Neil Gershenfeld. ©1998 by
Neil A. Gershenfeld. Reprinted by arrangement with Henry Holt and
Company, LLC.
|