|Chapter 10: Seeing Through Windows
Originally published by Henry Holt and Company 1999. Published on KurzweilAI.net May 15, 2003.
I vividly recall a particular car trip from my childhood because
it was when I invented the laptop computer. I had seen early teletype
terminals; on this trip I accidentally opened a book turned on its
side and realized that there was room on the lower page for a small
typewriter keyboard, and on the upper page for a small display screen.
I didn't have a clue how to make such a thing, or what I would do
with it, but I knew that I had to have one. I had earlier invented
a new technique for untying shoes, by pulling on the ends of the
laces; I was puzzled and suspicious when my parents claimed prior
knowledge of my idea. It would take me many more years to discover
that Alan Kay had anticipated my design for the laptop and at that
time was really inventing the portable personal computer at Xerox's
Palo Alto Research Center (PARC).
Despite the current practice of putting the best laptops in the
hands of business executives rather than children, my early desire
to use a laptop is much closer to Alan's reasons for creating one.
Alan's project was indirectly inspired by the work of the Swiss
psychologist Jean Piaget, who from the 1920s onward spent years
and years studying children. He came to the conclusion that what
adults see as undirected play is actually a very structured activity.
Children work in a very real sense as little scientists, continually
positing and testing theories for how the world works. Through their
endless interactive experiments with the things around them, they
learn first how the physical world works, and then how the world
of ideas works. The crucial implication of Piaget's insight is that
learning cannot be restricted to classroom hours, and cannot be
encoded in lesson plans; it is a process that is enabled by children's
interaction with their environment.
Seymour Papert, after working with Piaget, brought these ideas
to MIT in the 1960s. He realized that the minicomputers just becoming
available to researchers might provide the ultimate sandbox for
children. While the rest of the world was developing programming
languages for accountants and engineers, Seymour and his collaborators
created LOGO for children. This was a language that let kids express
abstract programming constructs in simple intuitive terms, and best
of all it was interfaced to physical objects so that programs could
move things outside of the computer as well as inside it. The first
one was a robot "turtle" that could roll around under control of
the computer, moving a pen to make drawings.
Infected by the meme of interactive technology for children, Alan
Kay carried the idea to the West Coast, to Xerox's Palo Alto Research
Center. In the 1970s, he sought to create what he called a Dynabook,
a portable personal knowledge navigator shaped like a notebook,
a fantasy amplifier. The result was most of the familiar elements
of personal computing.
Unlike early programming languages that required a specification
of a precise sequence of steps to be executed, modern object-oriented
languages can express more complex relationships among abstract
objects. The first object-oriented programming language was Smalltalk,
invented by Alan to let children play as easily with symbolic worlds
as they do with physical ones. He then added interface components
that were being developed by Doug Engelbart up the road at the Stanford
Doug was a radar engineer in World War II. He realized that a computer
could be more like a radar console than a typewriter, interactively
drawing graphics, controlled by an assortment of knobs and levers.
Picking up a theme that had been articulated by Vannevar Bush (the
person most responsible for the government's support of scientific
research during and after the war) in 1945 with his proposal for
a mechanical extender of human memory called a Memex, Doug understood
that such a machine could help people navigate through the increasingly
overwhelming world of information. His colleagues thought that he
Computers were specialized machines used for batch processing,
not interactive personal appliances. Fortunately, Engelbart was
able to attract enough funding to set up a laboratory around the
heretical notion of studying how people and computers might better
interact. These ideas had a coming out in a rather theatrical demo
he staged in San Francisco in 1968, showing what we would now recognize
as an interactive computer with a mouse and multiple windows on
In 1974 these elements came together in the Xerox Alto prototype,
and reached the market in Xerox's Star. The enormous influence of
this computer was matched by its enormous price tag, about $50,000.
This was a personal computer that only big corporations could afford.
Windows and mice finally became widely available and affordable
in Apple's Macintosh, inspired by Steve Jobs's visit to PARC in
1979, and the rest of personal computing caught up in 1990 when
Microsoft released Windows 3.0.
The prevailing paradigm for how people use computers hasn't really
changed since Englebart's show in 1968. Computers have proliferated,
their performance has improved, but we still organize information
in windows and manipulate it with a mouse. For years the next big
interface has been debated. There is a community that studies such
things, called Human-Computer Interactions. To give you an idea
of the low level of that discussion, one of the most thoughtful
HCI researchers, Bill Buxton (chief scientist at Silicon Graphics),
is known for the insight that people have two hands. A mouse forces
you to manipulate things with one hand alone; Bill develops interfaces
that can use both hands.
A perennial contender on the short list for the next big interface
is speech recognition, promising to let us talk to our computers
as naturally as we talk to each other. Appealing as that is, it
has a few serious problems. It would be tiring if we had to spend
the day speaking continuously to get anything done, and it would
be intrusive if our conversations with other people had to be punctuated
by our conversations with our machines. Most seriously, even if
speech recognition systems worked perfectly (and they don't), the
result is no better than if the commands had been typed. So much
of the frustration in using a computer is not the effort to enter
the commands, it's figuring out how to tell it to do what you want,
or trying to interpret just what it has done. Speech is a piece
of the puzzle, but it doesn't address the fundamental mysteries
confronting most computer users.
A dream interface has always been dreams, using mind control to
direct a computer. There is now serious work being done on making
machines that can read minds. One technique used is magnetoencephalography
(MEG), which places sensitive detectors of magnetic fields around
a head and measures the tiny neural currents flowing in the brain.
Another technique, functional magnetic resonance imaging, uses MRI
to make a 3D map of chemical distributions in the brain to locate
where metabolic activity is happening. Both of these can, under
ideal conditions, deduce something about what is being thought,
such as distinguishing between listening to music and looking at
art, or moving one hand versus the other. The problem that both
struggle with is that the brain's internal representation is not
designed for external consumption.
Early programmers did a crude form of MEG by placing a radio near
a computer; the pattern of static could reveal when a program got
stuck in a loop. But as soon as video displays came along it became
much easier for the computer to present the information in a meaningful
form, showing just what it was doing. In theory the same information
could be deduced by measuring all of the voltages on all of the
leads of the chips; in practice this is done only by hardware manufacturers
in testing new systems, and it takes weeks of effort.
Similarly, things that are hard to measure inside a person are
simple to recognize on the outside. For example, hold your finger
up and wiggle it back and forth. You've just performed a brain control
task that the Air Force has spent a great deal of time and money
trying to replicate. They've built a cockpit that lets a pilot control
the roll angle by thinking; trained operators on a good day can
slowly tilt it from side to side. They're a long way from flying
a plane that way.
In fact, a great deal of the work in developing thought interfaces
is actually closer to wiggling your finger. It's much easier to
accidentally measure artifacts that come from muscle tension in
your forehead or scalp than it is to record signals from deep in
the brain. Instead of trying to teach people to do the equivalent
of wiggling their ears, it's easier to use the parts of our bodies
that already come wired for us to interact with the world.
Another leading contender for the next big interface is 3D graphics.
Our world is three-dimensional; why limit the screen to two dimensions?
With advances in the speed of graphical processors it is becoming
possible to render 3D scenes as quickly as 2D windows are now drawn.
A 3D desktop could present the files in a computer as the drawers
of a file cabinet or as a shelf of books, making browsing more intuitive.
If you're willing to wear special glasses, the 3D illusion can be
A 3D display can even be more than an illusion. My colleague Steve
Benton invented the reflection holograms on your credit cards; his
group is now developing real-time holographic video. A computer
calculates the light that would be reflected from a three-dimensional
object, and modulates a laser beam to produce exactly that. Instead
of tricking the eyes by using separate displays to produce an illusion
of depth, his display actually creates the exact light pattern that
the synthetic object would reflect.
Steve's system is a technological tour de force, the realization
of a long-standing dream in the display community. It's also slightly
disappointing to many people who see it, because a holographic car
doesn't look as good as a real car. The problem is that reality
is just too good. The eye has the equivalent of many thousands of
lines of resolution, and a refresh rate of milliseconds. In the
physical world there's no delay between moving an object and seeing
a new perspective. Steve may someday be able to match those specifications
with holographic video, but it's a daunting challenge.
Instead of struggling to create a computer world that can replace
our physical world, there's an alternative: augment it. Embrace
the means of interaction that we've spent eons perfecting as a species,
and enhance them with digital content.
Consider Doug Engelbart's influential mouse. It is a two'dimensional
controller that can be moved left and right, forward and backward,
and intent is signaled by pressing it. It was preceded by a few
centuries by another two-dimensional controller, a violin bow. That,
too, is moved left and right, forward and backward, and intent is
communicated by pressing it. In this sense the bow and mouse are
very similar. On the other hand, while a good mouse might cost $10,
a good bow can cost $10,000. It takes a few moments to learn to
use a mouse, and a lifetime to learn to use a bow. Why would anyone
prefer the bow?
Because it lets them do so much more. Consider the differences
between the bow technique and the mouse technique:
- Sul ponticello (bowing close to the bridge)
- Spiccato (dropping the bow)
- martelé (forcefully releasing the stroke)
- Jeté (bouncing the bow)
- Tremolo(moving back and forth repeatedly)
- Sul tasto (bowing over the fingerboard)
- Arpeggio (bouncing on broken chords)
- Col legno (striking with the stick)
- Viotti (unaccented then accented note)
- Staccato (many martele notes in one stroke)
- Staccato volante (slight spring during rapid staccato)
- Détaché (vigorous articulated stroke)
- Legato (smotth stroke up or down)
- Sautillé (rapid strike in middle of bow)
- Lauré (separated slurred notes)
- Ondulé (tremolo between two strings)
There's much more to the bow than a casual marketing list of features
might convey. Its exquisite physical construction lets the player
perform a much richer control task, relying on the intimate connection
between the dynamics of the bow and the tactile interface to the
hand manipulating and sensing its motion. Compare that nuance to
a mouse, which can be used perfectly well while wearing mittens.
When we did the cello project I didn't want to ask Yo-Yo to give
up this marvelous interface; I retained the bow and instead asked
the computer to respond to it. Afterward, we found that the sensor
I developed to track the bow could respond to a hand without the
bow. This kind of artifact is apparent any time a radio makes static
when you walk by it, and was used back in the 1930s by the Russian
inventor Lev Termen in his Theremin, the musical staple of science-fiction
movies that makes eerie sounds in response to a player waving their
arms in front of it.
My student Josh Smith and I found that lurking behind this behavior
was a beautiful mathematical problem: given the charges measured
on two-dimensional electrodes, what is the three'dimensional distribution
of material that produced it? As we made headway with the problem
we found that we could make what looks like an ordinary table, but
that has electrodes in it that create a weak electric field that
can find the location of a hand above it. It's completely unobtrusive,
and responds to the smallest motions a person can make (millimeters)
as quickly as they can make them (milliseconds). Now we don't need
to clutter the desk with a rodent; the interface can disappear into
the furniture. There's no need to look for a mouse since you always
know where to find your hand.
The circuit board that we developed to make these measurements
ended up being call a "Fish," because fish swim in 3'D instead of
mice crawling in 2'D, and some fish that live in murky waters use
electric fields to detect objects in their vicinity just as we were
rediscovering how to do it. In retrospect, it's surprising that
it has taken so long for such an exquisite biological sense to get
used for computer interfaces. There's been an anthropomorphic tendency
to assume that a computer's senses should match our own.
We had trouble keeping the Fish boards on hand because they would
be carried off around the Media Lab by students who wanted to build
physical interfaces. More recently, the students have been acquiring
as many radio-frequency identification (RFID) chips as they can
get their hands on. These are tiny processors, small enough even
to be swallowed, that are powered by an external field that can
also exchange data with them. They're currently used in niche applications,
such as tracking laboratory animals, or in the key-chain tags that
enable us to pump gas without using a credit card. The students
use them everywhere else. They make coffee cups that can tell the
coffeemaker how you like your coffee, shoes that can tell a doorknob
who you are, and mouse pads that can read a Web URL from an object
placed on it.
You can think of this as a kind of digital shadow. Right now objects
live either in the physical world or as icons on a computer screen.
User interface designers still debate whether icons that appear
to be three-dimensional are better than ones that look two-dimensional.
Instead, the icons can really become three-dimensional; physical
objects can have logical behavior associated with them. A business
card should contain an address, but also summon a Web page if placed
near a Web browser. A pen should write in normal ink, but also remember
what it writes so that the information can be recalled later in
a computer, and it should serve as a stylus to control that computer.
A house key can also serve as a cryptographic key. Each of these
things has a useful physical function as well as a digital one.
My colleague Hiroshi Ishii has a group of industrial designers,
graphical designers, and user interface designers studying how to
build such new kinds of environmental interfaces. A recurring theme
is that interaction should happen in the context that you, rather
than the computer, find meaningful. They use video projectors so
that tables and floors and walls can show relevant information;
since Hiroshi is such a good Ping-Pong player, one of the first
examples was a Ping-Pong table that displayed the ball's trajectory
in a fast-moving game by connecting sensors in the table to a video
projector aimed down at the table. His student John Underkoffler
notes that a lamp is a one-bit display that can be either on or
off; John is replacing lightbulbs with combinations of computer
video projectors and cameras so that the light can illuminate ideas
as well as spaces.
Many of the most interesting displays they use are barely perceptible,
such as a room for managing their computer network that maps the
traffic into ambient sounds and visual cues. A soothing breeze indicates
that all is well; the sights and sounds of a thunderstorm is a sign
of an impending disaster that needs immediate attention. This information
about their computer network is always available, but never demands
direct attention unless there is a problem.
Taken together, ambient displays, tagged objects, and remote sensing
of people have a simple interpretation: the computer as a distinguishable
object disappears. Instead of a fixed display, keyboard, and mouse,
the things around us become the means we use to interact with electronic
information as well as the physical world. Today's battles between
competing computer operating systems and hardware platforms will
literally vanish into the woodwork as the diversity of the physical
world makes control of the desktop less relevant.
This is really no more than Piaget's original premise of learning
through manipulation, filtered through Papert and Kay. We've gotten
stuck at the developmental stage of early infants who use one hand
to point at things in their world, a decidedly small subset of human
experience. Things we do well rely on all of our senses.
Children, of course, understand this. The first lesson that any
technologist bringing computers into a classroom gets taught by
the kids is that they don't want to sit still in front of a tube.
They want to play, in groups and alone, wherever their fancy takes
them. The computer has to tag along if it is to participate. This
is why Mitch Resnick, who has carried on Seymour's tradition at
the Media Lab, has worked so hard to squeeze a computer into a Lego
brick. These bring the malleability of computing to the interactivity
of a Lego set.
Just as Alan's computer for kids was quickly taken over by the
grown-ups, Lego has been finding that adults are as interested as
kids in their smart bricks. There's no end to the creativity that's
found expression through them; my favorite is a descendent of the
old LOGO turtle, a copier made from a Lego car that drives over
a page with a light sensor and then trails a pen to draw a copy
of the page.
A window is actually an apt metaphor for how we use computers now.
It is a barrier between what is inside and what is outside. While
that can be useful at times (such as keeping bugs where they belong),
it's confining to stay behind it. Windows also open to let'fresh
air in and let people out.
All along the coming interface paradigm has been apparent. The
mistake was to assume that a computer interface happens between
a person sitting at a desk and a computer sitting on the desk. We
didn't just miss the forest for the trees, we missed the earth and
the sky and everything else. The world is the next interface.
WHEN THINGS START TO THINK by Neil Gershenfeld. ©1998 by
Neil A. Gershenfeld. Reprinted by arrangement with Henry Holt and