Intelligence as an Emergent Behavior or, The Songs of Eden
Could we build a thinking machine by simply hooking together a large network of artificial neurons and waiting for intelligence to spontaneously emerge? Not likely, but by studying the properties of biological and emergent systems, a carefully constructed network of artificial neurons could be inoculated with thought, similar to yeast's role in making beer. The clue may be in the "songs" of apes.
Originally published
Winter 1988 in Daedalus,
Journal of the American Academy of Arts and Sciences. Published
on KurzweilAI.net on May 2, 2002.
Sometimes a system with many simple components will exhibit a behavior
of the whole that seems more organized than the behavior of the
individual parts. Consider the intricate structure of a snowflake.
Symmetric shapes within the crystals of ice repeat in threes and
sixes, with patterns recurring from place to place and within themselves
at different scales. The shapes formed by the ice are consequences
of the local rules of interaction that govern the molecules of water,
although the connection between the shapes and the rules is far
from obvious. After all, these are the same rules of interaction
that cause water to suddenly turn to steam at its boiling point
and cause whirlpools to form in a stream. The rules that govern
the forces between water molecules seem much simpler than crystals
or whirlpools or boiling points, yet all of these complex phenomena
are called emergent behaviors of the system.
It would be very convenient if intelligence were an emergent behavior
of randomly connected neurons in the same sense that snowflakes
and whirlpools are the emergent behaviors of water molecules. It
might then be possible to build a thinking machine by simply hooking
together a sufficiently large network of artificial neurons. The
notion of emergence would suggest that such a network, once it reached
some critical mass, would spontaneously begin to think.
This is a seductive idea, since it allows for the possibility of
constructing intelligence without first understanding it. Understanding
intelligence is difficult and probably a long way off. The possibility
that it might spontaneously emerge from the interactions of a large
collection of simple parts has considerable appeal to a would-be
builder of thinking machines. Unfortunately, as a practical approach
to construction, the idea tends to be unproductive. The concept
of emergence, in itself, offers neither guidance on how to construct
such a system nor insight into why it would work.
Ironically, this apparent inscrutability accounts for much of the
idea's continuing popularity, since it offers a way to believe in
physical causality while simultaneously maintaining the impossibility
of a reductionist explanation of thought. For some, our ignorance
of how local interactions produce emergent behavior offers a reassuring
fog in which to hide free will.
There has been a renewal of interest in emergent behavior in the
form of neural networks and connectionist models, spin glasses and
cellular automata, and evolutionary models. The reasons for this
interest have little to do with philosophy in one way or the other,
but rather are a combination of new insights and new tools. The
insights come primarily from a branch of physics called "dynamical
systems theory." The tools come from the development of new
types of computing devices. Just as in the 1950's we thought of
intelligence in terms of servomechanism, and in the 60's and 70's
in terms of sequential computers, we are now beginning to think
in terms of parallel machines. This is not a deep philosophical
shift, but it is of great practical importance, since it is now
possible to study large emergent systems experimentally.
Inevitably, anti-reductionists interpret such progress as a schism
within the field between symbolic rationalists who oppose them and
gestaltists who support them. I have often been asked which "side"
I am on. Not being a philosopher, my inclination is to focus on
the practical aspects of this question: How would we go about constructing
an emergent intelligence? What information would we need to know
in order to succeed? How can this information be determined by experiment?
The emergent system that I can most easily imagine would be an
implementation of symbolic thought, rather than a refutation of
it. Symbolic thought would be an emergent property of the system.
The point of view is best explained by the following parable about
the origin of human intelligence. As far as I know, this parable
of human evolution is consistent with the available evidence (as
are many others), but since it is chosen to illustrate a point it
should be read as a story rather than as a theory. It is reversed
from most accepted theories of human development in that it presents
features that are measurable in the archeological records such as
increased brain size, food sharing, and neoteny, as consequences
rather than as causes of intelligence.
Once upon a time, about two and a half million years ago, there
lived a race of apes that walked upright. In terms of intellect
and habit they were similar to modern chimpanzees. The young apes,
like many young apes today, had a tendency to mimic the actions
of others. In particular, they had a tendency to imitate sounds.
If one ape went "ooh, eeh, eeh," it would be likely that
the other one would repeat, "ooh, eeh, eeh." (I do not
know why apes do this, but they do. As do many species of birds.)
Some sequences of sounds were more likely to be repeated than others.
I will call these "songs."
For the moment let us ignore the evolution of the apes and consider
the evolution of the songs. Since the songs were replicated by the
apes, and since they sometimes died away and were occasionally combined
with others, we may consider them, very loosely, a form of life.
They survived, bred, competed with one another, and evolved according
to their own criterion of fitness. If a song contained a particularly
catchy phrase that caused it to be repeated often, then that phrase
was likely to be repeated and incorporated into other songs. Only
songs that had a strong tendency to be repeated survived.
The survival of the song was only indirectly related to the survival
of the apes. It was more directly affected by the survival of other
songs. Since the apes were a limited resource, the songs had to
compete with one another for a chance to be sung. One successful
strategy for competition was for a song to specialize; that is,
for it to find a particular niche where it would be likely to be
repeated. Songs that fit particularly well with a specific mood
or activity of an ape had a special survival value for this reason.
(I do not know why some songs fit well with particular moods, but
since it is true for me I do not find it hard to believe for my
ancestors.)
Up to this point the songs were not of any particular value to
the apes. In a biological sense they were parasites, taking advantage
of the apes' tendency to imitate. Once the songs began to specialize,
however, it became advantageous for an ape to pay attention to the
songs of others and to differentiate between them. By listening
to songs, a clever ape could gain useful information. For example,
an ape could infer that another ape had found food, or that it was
likely to attack. Once the apes began to take advantage of the songs,
a mutually beneficial symbiosis developed. Songs enhanced their
survival by conveying useful information. Apes enhanced their survival
by improving their capacity to remember, replicate, and understand
songs. The blind forces of evolution created a partnership between
the songs and the apes that thrived on the basis of mutual self-interest.
Eventually this partnership evolved into one of the world's most
successful symbionts: us.
Unfortunately, songs do not leave fossils, so unless some natural
process has left a phonographic trace, we may never know if this
is what really happened. But if the story is true, the apes and
the songs became the two components of human intelligence. The songs
evolved into the knowledge, mores, and mechanism of thought that
together are the symbolic portion of human intelligence. The apes
became apes with bigger brains, perhaps optimized for late maturity
so that they could learn more songs. "Homo Sapiens" is
a cooperative combination of the two.
It is not unusual in nature for two species to live together so
interdependently that they appear to be a single organism. Lichens
are symbionts of a fungus and an alga living so closely intertwined
that they can only be separated under a microscope. Bean plants
need living bacteria in their roots to fix the nitrogen from the
soil, and in return the bacteria need nutrients from the bean plants.
Even the single celled "Paramecium Bursarra" uses green
algae living inside itself to synthesize food.
There may be an example even closer to the songs and the apes,
where two entirely different forms of "life" form a symbiosis.
In "The Origins of Life," Freeman Dyson suggests that
biological life is a symbiotic combination of two different self-reproducing
entities with very different forms of replication. Dyson suggests
that life originated in two stages. While most theories of the origin
of life start with nucleotides replicating in some "primeval
soup," Dyson's theory starts with metabolizing drops of oil.
In the beginning, these hypothetical replicating oil drops had
no genetic material, but were self-perpetuating chemical systems
that absorbed raw materials from their surroundings. When a drop
reached a certain size it would split, with about half of the constituents
going to each part. Such drops evolved efficient metabolic systems
even though their rules of replication were very different from
the Mendelian rules of modern life. Once the oil drops became good
at metabolizing, they were infected by another form of replicators,
which, like the songs, have no metabolism of their own. These were
parasitic molecules of DNA which, like modern viruses, took advantage
of the existing machinery of the cells to reproduce. The metabolizers
and the DNA eventually co-evolved into a mutually beneficial symbiosis
that we know today as life.
This two-part theory of life is not conceptually far from the two-part
story of intelligence. Both suggest that a pre-existing homeostatic
mechanism was infected by an opportunistic parasite. The two parts
reproduced according to different set of rules, but were able to
co-evolve so successfully that the resulting symbiont appears to
be a single entity.
Viewed in this light, choosing between emergence and symbolic computation
in the study of intelligence would be like choosing between metabolism
and genetic replication in the study of life. Just as the metabolic
system provides a substrate in which the genetic system can work,
so an emergent system may provide a substrate in which the symbolic
system can operate.
Currently, the metabolic system of life is far too complex for
us to fully understand or reproduce. By comparison, the Mendelian
rules of genetic replication are almost trivial, and it is possible
to study them as a system unto themselves without worrying about
the details of metabolism which supports them. In the same sense,
it seems likely that symbolic thought can be fruitfully studied
and perhaps even recreated without worrying about the details of
the emergent system that supports it. So far this has been the dominant
approach in artificial intelligence and the approach that has yielded
the most progress.
The other approach is to build a model of the emergent substrate
of intelligence. This artificial substrate for thought would not
need to mimic in detail the mechanisms of the biological system,
but it would need to exhibit those emergent properties that are
necessary to support the operations of thought.
What is the minimum that we would need to understand in order to
construct such a system? For one thing, we would need to know how
big a system to build. How many bits are required to store the acquired
portion of human knowledge of a typical human? We need to know an
approximate answer in order to construct an emergent intelligence
with human-like performance. Currently the amount of information
stored by a human is not known to within even two orders of magnitude,
but it can in principle be determined by experiment. There are at
least three ways the question might be answered.
One way to estimate the storage requirements for emergent intelligence
would be from an understanding of the physical mechanisms of memory
in the human brain. If that information is stored primarily by modifications
of synapses, then it would be possible to measure the information
storage capacity of the brain by counting the number of synapses.
Elsewhere in this issue, Schwartz shows that this method leads to
an upper bound on the storage capacity of the brain of 10 to the
15th bits. Even knowing the exact amount of physical storage in
the brain would not completely answer the question of storage requirement,
since much of the potential storage might be unused or used inefficiently.
But at least this method can help establish an upper bound on the
requirements.
A second method for estimating the information in symbolic knowledge
would be to measure it by some form of statistical sampling. For
instance, it is possible to estimate the size of an individual's
vocabulary by testing specific words randomly sampled from a dictionary.
The fraction of words known by the individual is a good estimate
of the fraction of words known in the complete dictionary. The estimated
vocabulary size is this fraction times the number of words in the
dictionary. The experiment depends on having a predetermined body
of knowledge against which to measure. For example, it would be
possible to estimate how many facts in the "Encyclopedia Britannica"
were known by a given individual, but this would give no measure
of facts not contained within the encyclopedia. The method is useful
only in establishing a lower bound.
A related experiment is the game of "20 questions" in
which one player identifies an object chosen by the other by asking
a series of 20 yes-or-no questions. Since each answer provides no
more than a single bit of information, and since skillful players
generally require almost all of the 20 questions to choose correctly,
we can estimate that the number of allowable choices is on the order
of 2 to the 20th, or about one million. This gives an estimated
number of allowable objects known in common by the two players.
Of course, the measure is inaccurate since the questions are not
perfect and the choices of objects are not random. It is possible
that a refined version of the game could be developed and used to
provide another lower bound.
A third approach to measuring the amount of information of the
symbolic portion of human knowledge is to estimate the rate of acquisition
and to integrate over time. For example, experiments on memorizing
random sequences of syllables indicate that the maximum memorization
rate of this type of knowledge is about one "chunk" per
second. A "chunk" in this context can be safely assumed
to contain less than 100 bits of information, so the results suggest
that the maximum rate that a human is able to commit information
to long-term memory is significantly less than 100 bits per second.
If this is true, a 20-year-old human learning at the maximum rate
for 16 hours a day would know less than 50 gigabits of information.
I find this number surprisingly small.
A difficulty with this estimate of the rate of acquisition is that
the experiment measures only information coming through one sensory
channel under one particular set of circumstances. The visual system
sends more than a million times this rate of information to the
optic nerve, and it is conceivable that all of this information
is committed to memory. If it turns out that images are stored directly,
it will be necessary to significantly increase the 100 bit per second
limit, but there is no current evidence that this is the case. In
experiments measuring the ability of exceptional individuals to
store "eidetic" images of random dot stereograms, the
subjects are given about 5 minutes to "memorize" a 128x128
image. Memorizing only a few hundred of these bits is probably sufficient
to pass the test.
I am aware of no evidence that suggests more than a few bits per
second of any type of information can be committed to long-term
memory. Even if we accept at face value reports of extraordinary
feats of memory, such as those of Luria's showman in "Mind
of the Mnemonist", the average rate of commitment to memory
never seems to exceed a few bits per second. Experiments should
be able to refine this estimate, but even if we knew the maximum
rate exactly, the rate averaged over a lifetime would probably be
very much less. Knowing the maximum rate would establish an upper
bound on the requirements of storage.
The sketchy data cited above would suggest that an intelligent
machine would require 10 to the 9th bits of storage, plus or minus
two orders of magnitude. This assumes that the information is encoded
in such a way that it requires a minimum amount of storage, which
for the purpose of processing information would probably not be
the most practical representation. As a would-be builder of thinking
machines, I find this number encouragingly small, since it is well
within the range of current electronic computers. As a human with
an ego, I find it distressing. I do not like to think that my entire
lifetime of memories could be placed on a reel of magnetic tape.
Hopefully experimental evidence will clear this up one way or the
other.
There are a few subtleties in the question of storage requirements,
in defining the quantity of information in a way that is independent
of the representation. Defining the number of bits in the information-theoretical
sense requires a measure of the probabilities over the ensemble
of possible states. This means assigning an "a priori"
probability to each possible set of knowledge, which is the role
of inherited intelligence. Inherited intelligence provides a framework
in which the knowledge of acquired intelligence can be interpreted.
Inherited intelligence defines what is knowable; acquired intelligence
determines which of the knowable is known.
Another potential difficulty is how to count the storage of information
that can be deduced from other data. In the strict information-theoretical
sense, data that can be inferred from other data add no information
at all. An accurate measure would have to take into account the
possibility that knowledge is inconsistent, and that only limited
inferences are actually made. These are the kind of issues currently
being studied on the symbolic side of the field of artificial intelligence.
One issue that does not need to be resolved to measure storage
capacity is distributed versus localized representation. Knowing
what types of representation are used in what parts of the human
brain would be of considerable scientific interest, but it does
not have a profound impact on the amount of storage in the system,
or on our ability to measure it. Non-technical commentators have
a tendency to attribute almost mystical qualities to distributed
storage mechanisms such as those in holograms and neural networks,
but the limitations on their storage capacities are well understood.
Distributed representations with similar properties are often used
within conventional digital computers, and they are invisible to
most users except in the system's capacity to tolerate errors. The
error correcting memory used in most computers is a good example.
The system is composed of many physically separate memory chips,
but any single chip can be removed without loosing any data. This
is because the data is not stored in any one place, but in a distributed
non-local representation across all of the units. In spite of the
"holographic" representation, the information storage
capacity of the system is no greater than it would be with a conventional
representation. In fact, it is slightly less. This is typical of
distributed representations.
Storage capacity offers one measure of the requirements of a human-like
emergent intelligence. Another measure is the required rate of computation.
Here there is no agreed upon metric, and it is particularly difficult
to define a unit of measure that is completely independent of representation.
The measure suggested below is simple and the answer is certainly
important, if not sufficient.
Given an efficiently stored representation of human knowledge,
what is the rate of access to that storage (in bits per second)
required to achieve human-like performance? Here "efficiently
stored representation" means any representation requiring only
a multiplicative constant of storage over the number of bits of
information. This restriction eliminates the formal possibility
of a representation storing a pre-computed answer to every question.
Allowing storage within a multiplicative constant of the optimum
does restrict the range of possible representations, but it allows
most representations that we would regard as reasonable. In particular,
it allows both distributed and local representations.
The question of the bandwidth required for human-like performance
is accessible by experiment, along similar approaches as those outlined
for the question of storage capacity. If the "cycle time"
of human memory is limited by the firing time of a neuron, then
the ratio of this answer to the total number of bits tells the fraction
of the memory that is accessed simultaneously. This gives an indication
of the parallel or serial nature of the computation. Informed opinions
differ greatly in this matter. The bulk of the quantitative evidence
favors the serial approach. Memory retrieval times for items in
lists, for example, depend on the position and the number of items
in the list. Except for sensory processing, most successful artificial
intelligence programs have been based on serial models of computation,
although this may be a distortion caused by the availability of
serial machines.
My own guess is that the reaction time experiments are misleading
and that human-level performance will require accessing of large
fractions of the knowledge several times per second. Given a representation
of acquired intelligence with a realistic representation efficiency
of 10%, the 10 to the 9th bits of memory mentioned above would require
a memory bandwidth about 10 to the 11th bits per second. This bandwidth
seems physiologically plausible since it corresponds to about a
bit per second per neuron in the cerebral cortex.
By way of comparison, the memory bandwidth of a conventional electronic
computer is in the range of 10 to the 6th to 10 to the 8th bits
per second. This is less than 0.1% of the imagined requirement.
For parallel computers the bandwidth is considerably higher. For
example, a 65,536 processor Connection Machine can access its memory
at approximately 10 to the 11th bits per second. It is not entirely
coincidence that this fits well with the estimate above.
Another important question is: What sensory-motor functions are
necessary to sustain symbolic intelligence? An ape is a complex
sensory-motor machine, and it is possible that much of this complexity
is necessary to sustain intelligence. Large portions of the brain
seem to be devoted to visual, auditory, and motor processing, and
it is unknown how much of this machinery is needed for thought.
A person who is blind and deaf or totally paralyzed can undoubtedly
be intelligent, but this does not prove that the portion of the
brain devoted to these functions is unnecessary for thought. It
may be, for example, that a blind person takes advantage of the
visual processing apparatus of the brain for spatial reasoning.
As we begin to understand more of the functional architecture of
the brain, it should be possible to identify certain functions as
being unnecessary for thought by studying patients whose cognitive
abilities are unaffected by locally confined damage to the brain.
For example, binocular stereo fusion is known to take place in a
specific area of the cortex near the back of the head. Patients
with damage to this area of the cortex have visual handicaps, but
show no obvious impairment in their ability to think. This suggests
that stereo fusion is not necessary for thought. This is a simple
example, and the conclusion is not surprising, but it should be
possible by such experiments to establish that many sensory-motor
functions are unnecessary. One can imagine, metaphorically, whittling
away at the brain until it is reduced to its essential core. Of
course it is not quite this simple. Accidental damage rarely incapacitates
completely and exclusively a single area of the brain. Also, it
may be difficult to eliminate one function at a time since one mental
capacity may compensate for the lack of another.
It may be more productive to assume that all sensory-motor apparatus
is unnecessary until proven useful for thought, but this is contrary
to the usual point of view. Our current understanding of the philogenic
development of the nervous system suggests a point of view in which
intelligence is an elaborate refinement of the connection between
input and output. This is reinforced by the experimental convenience
of studying simple nervous systems, or studying complicated nervous
systems by concentrating on those portions most directly related
to input and output. By necessity, most everything we know about
the function of the nervous system comes from experiments on those
portions that are closely related to sensory inputs or motor outputs.
It would not be surprising if we have overestimated the importance
of these functions to intelligent thought.
Sensory-motor functions are clearly important for the application
of intelligence and for its evolution, but these are separate issues
from the question above. Intelligence would not be of much use without
an elaborate system of sensory apparatus to measure the environment
and an elaborate system motor apparatus to change it, nor would
it have been likely to have evolved. But the apparatus necessary
to exercise and evolve intelligence is probably very much more than
the apparatus necessary to sustain it. One can believe in the necessity
of the opposable thumb for the development of intelligence, without
doubting a human capacity for thumbless thought. It is quite possible
that even the meager sensor-motor capabilities that we currently
know how to provide would be sufficient for the operation of emergent
intelligence.
These questions of capacity and scope are necessary in defining
the magnitude of the task of constructing an emergent intelligence,
but the key question is one of understanding. While it is possible
that we will be able to recreate the emergent substrate of intelligence
without fully understanding the details of how it works, it seems
likely that we would at least need to understand some of its principles.
There are at least three paths by which such understanding could
be achieved. One is to study the properties of specific emergent
systems, to build a theory of their capabilities and limitations.
This kind of experimental study is currently being conducted on
several classes of promising systems including neural networks,
spin glasses, cellular automata, classifier systems and adaptive
automata. Another possible path to understanding is the study of
biological systems, which are our only real examples of intelligence,
and our only example of an emergent system which has produced intelligence.
The disciplines that have provided the most useful information of
this type so far have been neurophysiology, cognitive psychology,
and evolutionary biology. A third path would be a theoretical understanding
of the requirements of intelligence, or of the phenomena of emergence.
Examples of relevant disciplines of theories of logic and computability,
linguistics, and dynamical systems theory. Anyone who looks to emergent
systems as a way of defending human thought from the scrutiny of
science is likely to be disappointed.
One cannot conclude, however, that a reductionist understanding
is necessary for the creation of intelligence. Even a little understanding
could go a long way toward the construction of an emergent system.
A good example of this is how cellular automata have been used to
simulate the emergent behavior of fluids.
The whirlpools that form as a fluid flows past a barrier are not
well understood analytically, yet they are of great practical importance
in the design of boats and airplanes. Equations that describe the
flow of a fluid have been known for almost a century, but except
for a few simple cases they cannot be solved. In practice the flow
is generally analyzed by simulation. The most common method of simulation
is the numerical solution of the continuous equations.
On a highly parallel computer it is possible to simulate fluids
with even less understanding of the system, by simulating billions
of colliding particles that reproduce the emergent phenomena such
as vortices. Calculating the detailed molecular interactions for
so many particles would be extremely difficult, but a few simple
aspects of the system such as conservations of energy and particle
number are sufficient to reproduce the large-scale behavior. A system
of simplified particles that obey these two laws, but are otherwise
unrealistic, can reproduce the same emergent phenomena as reality.
For example, it is possible to use particles of unit mass that move
only at unit speed along a hexagonal lattice, colliding according
to the rules of billiard balls. Experiments show that this model
produces laminar flow, vortex streams, and even turbulence that
is indistinguishable from the behavior of real fluids. Although
the detailed rules of interaction are very different than the interactions
of real molecules, the emergent phenomena are the same. The emergent
phenomena can be created without understanding the details of the
forces between the molecules or the equations that describe the
flow of the fluid.
The recreation of intricate patterns of ebbs and flows within a
fluid offers an example of how it is possible to produce a phenomenon
without fully understanding it. But the model was constructed by
physicists who knew a lot about fluids. That knowledge helped to
determine which features of the physical system were important to
implement, and which were not.
Physics is an unusually exact science. Perhaps a better example
of an emergent system which we can simulate with only a limited
understanding is evolutionary biology. We understand, in a weak
sense, how creatures with Mendelian patterns of inheritance, and
different propensities for survival can evolve toward better fitness
in their environments. In certain simple situations we can even
write down equations that describe how quickly this adaptation will
take place. But there are many gaps in our understanding of the
processes of evolution. We can explain in terms of natural selection
why flying animals have light bones, but we cannot explain why certain
animals have evolved flight and others have not. We have some qualitative
understanding of the forces that cause evolutionary change, but
except in the simplest cases, we cannot explain the rate or even
the direction of that change.
In spite of these limitations, our understanding is sufficient
to write programs of simulated evolution that show interesting emergent
behaviors. For example, I have recently been using an evolutionary
simulation to evolve programs to sort numbers. In this system, the
genetic material of each simulated individual is interpreted as
a program specifying a pattern of comparisons and exchanges. The
probability of an individual survival in the system is dependent
on the efficiency and accuracy of this program in sorting numbers.
Surviving individuals produce offspring by sexual combination of
their genetic material with occasional random mutation. After tens
of thousands of generations, a population of hundreds of thousands
of such individuals will evolve very efficient programs for sorting.
Although I wrote the simulation producing these sorting programs,
I do not understand in detail how they were produced or how they
work. If the simulation had not produced working programs, I would
have had very little idea about how to fix it.
The fluid flow and simulated evolution examples suggest that it
is possible to make a great deal of use of a small amount of understanding.
The emergent behaviors exhibited by these systems are a consequence
of the simple underlying rules, which are defined by the program.
Although the systems succeed in producing the desired results, their
detailed behaviors are beyond our ability to analyze and predict.
One can imagine if a similar process produced a system of emergent
intelligence, we would have a similar lack of understanding about
how it worked.
My own guess is that such an emergent system would not be an intelligent
system itself, but rather the metabolic substrate on which intelligence
might grow. In terms of the apes and the songs, the emergent portion
of the system would play the role of the ape, or at least that part
of the ape that hosts the songs. This artificial mind would need
to be inoculated with human knowledge. I imagine this process to
be not so different from teaching a child. This would be a tricky
and uncertain procedure since, like a child, this emergent mind
would presumably be susceptible to bad ideas as well as good. The
result would be not so much an artificial intelligence, but rather
a human intelligence sustained within an artificial mind.
Of course, I understand that this is just a dream. And I will admit
that I am more propelled by hope than by the probability of success.
But if, within this artificial mind, the seed of human knowledge
begins to sustain itself and grow of its own accord, then for the
first time human thought will live free of bones and flesh, giving
this child of mind an earthly immortality denied to us.
Attempts to create emergent intelligence, at least those that are
far enough in the past for us to judge, have been disappointing.
Many computational systems, such as homeostats, perceptrons, and
cellular automata exhibit clear examples of emergent behavior, but
that behavior falls far short of intelligence. A perceptron, for
example, is a collection of artificial neurons that can recognize
simple patterns. Considerable optimism was generated in the 1960's
when it was proved that anything a perceptron could recognize, it
could learn to recognize from examples. This was followed by considerable
disappointment when it was realized that the set of things that
could be recognized at all was very limited. What appeared to be
complicated behavior of the system turned out in the final analysis
to be surprisingly simple.
In spite of such disappointments, I believe that the notion of
emergence contains an element of truth, an element that can be isolated
and put to use.
A helpful analogy is the brewing of beer. The brewmaster creates
this product by making a soup of barley and hops, and infecting
it with yeast. Chemically speaking most of the real work is done
by the yeast, which converts the starch to alcohol. The brewmaster
is responsible for creating and maintaining the conditions under
which that conversion can take place. The brewmaster does not need
to understand exactly how the yeast does its work, but does need
to understand the properties of the environment in which the yeast
will thrive. By providing the right combination of ingredients at
the right temperature in the right container, the brewmaster is
able to create the necessary conditions for the production of beer.
Something analogous to this process may be possible in the creation
of an artificial intelligence. It is unlikely that intelligence
would spontaneously appear in a random network of neurons, just
as it is unlikely that life would spontaneously appear in barley
soup. But just as carefully mixed soup can be inoculated with yeast,
it may be that a carefully constructed network of artificial neurons
can be inoculated with thought.
The approach depends on the possibility of separating human intelligence
into two parts, corresponding to the soup and the yeast. Depending
on one's point of view, these two parts can be viewed as hardware
and software, intellect and knowledge, nature and nurture, or program
and data. Each point of view carries with it a particular set of
intuitions about the nature of the split and the relative complexity
of the parts.
One way that biologists determine if a living entity is a symbiont
is to see if the individual components can be kept alive separately.
For example, biologists have tried (unsuccessfully) to prove the
oil-drop theory by sustaining metabolizing oil drops in an artificial
nutrient broth. Such an experiment for human intelligence would
have two parts. One would be a test of the human ape's ability to
live without the ideas of human culture. This experiment is occasionally
conducted in an uncontrolled form when feral children are reared
by animals. The two-part theory would predict that such children,
before human contact, would not be significantly brighter than nonhuman
primates. The complementary experiment, sustaining human ideas and
culture in an artificial broth, is the one in which we are more
specifically interested. If this were successful we would have a
thinking machine, although perhaps it would not be accurate to call
it an artificial intelligence. It would be natural intelligence
sustained within an artificial mind.
To pursue the consequences of this point of view, we will assume
that human intelligence can be cleanly divided into two portions
that we will refer to as acquired and inherited intelligence. These
correspond to the songs and to the apes, respectively, or in the
fermentation metaphor, the yeast and the barley soup. We will consider
only those features of inherited intelligence that are necessary
to support acquired intelligence, only those features of acquired
intelligence that impose requirements on inherited intelligence.
We will study the interface between the two.
Even accepting this definition of the problem, it is not obvious
that the interface is easy to understand or recreate. This leads
to a specific question about the scope of the interface that can
presumably be answered by experiment.
The functional scope of the interface between acquired and inherited
intelligence is not the only property that can be investigated.
To build a home for an animal, the first thing we would need to
know is the animal's size. This is also one of the first things
we need to know in building an artificial home for acquired intelligence.
This leads to question number two:
The guesses to answers that I have given are imprecise, but the
questions are not. In principle they can be answered by experiment.
The final question I will pose is more problematic. What I would
like to ask is "What are the organizing principles of inherited
intelligence?" but this question is vague and it is not clear
what would be an acceptable answer. I shall substitute a more specific
question that hopefully captures the same intent:
"Question IV: What quantities remain constant during the computation
of intelligence; or, equivalently, what functions of state are minimized?"
This question assumes that inheritable intelligence is some form
of homeostatic process and asks what quantity is held static. It
is the most difficult of the four questions, but historically it
has been an important question to ask in areas when there was not
yet a science to guide progress.
The study of chemistry is one example. In chemical reactions between
substances it is obvious that a great number of things change and
not so obvious what stays the same. It turns out that if the experiment
is done carefully, the weight of the reactants will always equal
the weight of the product. The total weight remains the same. This
is an important organizing principle in chemistry and understanding
it was a stepping stone to the understanding of an even more important
principle: the conservation of the weights of the individual elements.
The technical difficulty of defining and creating a truly closed
experiment, in particular eliminating the inflow and outflow of
gases, explains why chemists did not fully appreciate these principles
until the middle of the 19th century.
Another very different example of a system that can be understood
in terms of what is held constant is the system of formal logic.
This is a set of rules under which sentences may be changed without
changing their truth. A similar example, which has also been important
to artificial intelligence, is the lambda calculus, which is the
basis of the language Lisp. This is a system of transforming expressions
in such a way that their "values" do not change, where
the values are those forms of the expression which are not changed
by the transformations. (This sounds circular because it is. A more
detailed explanation would show it to be more so.) These formal
systems are conceptually organized around that which is held constant.
In physics there are many examples of how conservations have been
used successfully to organize our conception of reality, but while
conservations of energy, momentum, mass, and charge are certainly
important, I do not wish to make too much of them in this context.
In this sense the principles of conservation will more likely resemble
those of biology than physics.
One of the most useful conservation principles in biology appears
in the notion of a gene. This is the unit of character determination
that is conserved during reproduction. In sexual reproduction this
can get complicated since an individual receives a set of genes
from each of two parents. A gene that affects a given trait may
not be expressed if it is masked by another, and there is not a
simple correspondence between genes and measurable traits. The notion
that atomic units of inheritance are always present, even when they
are not expressed, was hard to accept and it was not widely believed
almost a century after Mendel's initial experiments. In fact the
conservation is not perfect, but it is still one of the most important
organizing principles in the study of living organisms.
In biology, the rules of conservation are often expressed as minimum
principles. The two forms are equivalent. For instance, the minimum
principle corresponding to the physical conservation of momentum
is the principle of least action. A biological example is the principle
of optimal adaptation, which states that species will evolve toward
fitness to their environments. The distance to the ideal is minimized.
A conservation principle associated with this is the Fischer Theorem
of Natural Selection, which states that the rate of change in fitness
is equal to the genetic variance. In cases where this minimum principle
can be applied, it allows biologists to quantitatively predict the
values of various biological parameters.
For example, sickle-cell anemia is a congenital disease controlled
by a recessive gene. Individuals who inherit the gene from both
parents are likely to die without reproducing, but individuals who
inherit the gene from a single parent are resistant to malaria.
In certain regions of West Africa 40% of the population carries
the gene. From this fact and the principle of optimal fitness, it
is possible to predict that the survival advantage of resistance
to malaria is about 25% in these regions. This estimate fits well
with measured data. Similar methods have been used to estimate the
number of eggs laid by a bird, the shape of sponges, and the gait
of animals at different speeds. But these examples of applying a
minimum principle are not so crisp as those of physics. Why, for
example, do we not evolve a non-lethal gene that protects against
malaria? The answer is complicated, and the principle of fitness
offers no help. It is useful in aiding our understanding, but it
does not explain all. This is probably the kind of answer to Question
IV for which we will have to settle.
Even in physics, knowledge of the exact law does not really explain
all behaviors. The snowflakes and whirlpools of water are examples.
The forces that govern the interaction of water molecules are understood
in some detail, but there is no analytical understanding of the
connection between these forces and their emergent behaviors of
water.
On the other hand, our goal is not necessarily to understand, but
to recreate. In both of the examples mentioned, conservation principles
give us sufficient understanding to recreate the phenomena.
In order to achieve this kind of understanding for intelligence
it will be necessary to ask and answer the kinds of questions that
are mentioned above.
I do not know the answer to Question IV. It is possible that it
will be very complicated and the interface between acquired and
inherited intelligence will be difficult to reproduce. But it is
also possible that it will be simple. One can imagine this would
be the artificial substrate for thought.
Once this is achieved it will still remain to inoculate the artificial
mind with the seed of knowledge. I imagine this to be not so different
from the process of teaching a child. It will be a tricky and uncertain
process since, like a child, this mind will presumably be susceptible
to bad ideas as well as good. The first steps will be the most delicate.
If we have prepared well, it will reach a point where it can sustain
itself and grow of its own accord.
For the first time human thought will live free of bones and flesh,
giving this child of mind an earthly immortality denied to us.
References
Dyson, Freeman. "The Origins of Life", Cambridge University
Press, 1985.
Haldane, J. B. S. "The Causes of Evolution", Harper &
Brothers, 1932.
Hillis, W. Daniel. "The Connection Machine", The MIT
Press, 1985.
Luria, A. R. "Mind of the Mnemonist", Basic Books, 1968.
Newell, Allen. "Human Problem Solving", Prentice Hall,
1972.
Wolfram, Stephen. "Theory of Applications of Cellular Automata",
World Scientific, 1986.
"Intelligence as an Emergent Behavior or, The Songs of
Eden" reprinted by permission of Daedalus, Journal of the
American Academy of Arts and Sciences, from the issue entitled,
"Artificial Intelligence," Winter 1988, Vol. 117, No.
1.
|