Singularity Math Trialogue
Hans Moravec, Vernor Vinge, and Ray Kurzweil discuss the mathematics of the Singularity, making various assumptions about growth of knowledge vs. computational power.
From: Hans Moravec
Date: February 15, 1999
To: Ray Kurzweil
Subject: Foldover in the accelerating exponential
Hi Ray,
Following up a thread touched on in your visit: Vernor Vinge noted
(and wrote SF stories) in the 1960s that AI, by automating the process
of innovation, would increasingly accelerate its own development,
shortening the time constant of its own exponential.
If the doubling rate is proportional to the magnitude of the exponential,
then the curve undergoes equal multiplicative steps in exponentially
decreasing time intervals, and reaches infinite slope in a finite
time.
Vinge calls this the "Technological Singularity," and notes that
it blocks prediction, just as a mathematical singularity prevents
extrapolation. There is probably a curve beyond the singularity,
but it has a different character.
Here is a recent pointer:
http://www-personal.engin.umich.edu/~jxm/singlar.html
I looked at the prescription for the accelerating exponential,
and noted the following:
From: Hans Moravec
Date: September 30, 1997
To: Vernor Vinge
Subject: A kind of Singularity
Hi Vernor,
A tidbit you've probably noticed before:
V = exp(V*t) has a singularity (at t = 1/e, V = e) in dV(t)/dt
but not in V(t) itself. Instead, V folds back, making a double valued
function that kind of suggests time travel!
You can see the curve most easily by parametrically plotting V
against t = log(V)/V with the parameter V going from about 1 (or
less) to 5.
Or take V only up to e to be neat.
Best--Hans
To: Hans Moravec
Subject: Re: A kind of Singularity
Date: October 1, 1997
From: Vernor Vinge
Hi Hans --
I hadn't noticed that. Neat and weird: "After a while the world
disappeared, but only because it had already become very much nicer"!
Hope things are all going well. (I'm back to teaching after two
years at writing. It is very interesting--and as hectic as I remember!)
Regards,
--Vernor
From: Hans Moravec
Date: Monday, February 15, 1999
To: Vernor Vinge, Ray Kurzweil
Subject: Singularity equation correction
So, I reconsidered the V = exp(V*t) formula that suggested a foldback
in the progress curve, and decided it didn't make a lot of sense,
in the way it jumbled time t and the rate V of computing.
More careful formulation makes for a less surprising conclusion.
Making maximal simplifying assumptions, and shifting and scaling
all quantities to avoid constants:
Let W be "world knowledge", and assume that each additive increment
in W results in a multiplicative improvement in miniaturization,
and thus in computer memory and speed V. So:
V = exp(W)
In the old days, assume an essentially constant number of humans
worked unassisted at a steady pace to increase W at a steady rate:
dW/dt = 1
So W = t and V = exp(t)
which is a regular Moore's law.
Now, suppose instead W is created solely by computers, and increases
at a rate proportional to computer speed. Then:
dW/dt = V giving dW/exp(W) = dt
This solves to W = log(-1/t) and V = -1/t
W and V rise very slowly when t<<0, might be mistaken for
exponential around t = -1, and have a glorious singularity at t
= 0.
From: Hans Moravec
Sent: March 10, 1999
To: Ray Kurzweil
Cc: Vernor Vinge
Subject: Response to Singularity Equations
Goody! This led to some new insights.
Ray said:
One of your assumptions is:
V = exp(W) (i.e., that computer power grows exponentially with
world knowledge).
I don't think this is a well-founded assumption. Why would additive
increments in W result in a multiplicative improvement in V? I agree
that V grows with W, but it makes more sense to say that: if we
double world knowledge, we double our understanding of how to build
a computer, and, therefore, double the power of computation per
unit cost.
The assumption that V = exp(W) is surely too optimistic. I was
thinking in terms of independent innovations. For instance, one
might be an algorithmic discovery (like log N sorting) that lets
you get the same result with half the computation. Another might
be a computer organization (like RISC) that lets you get twice the
computation with the same number of gates. Another might be a circuit
advance (like CMOS) that lets you get twice the gates in a given
space. Others might be independent speed-increasing advances, like
size-reducing copper interconnects and capacitance-reducing silicon-on-insulator
channels. Each of those increments of knowledge more or less multiplies
the effect of all of the others, and computation would grow exponentially
in their number.
But, of course, a lot of new knowledge steps on the toes of other
knowledge, by making it obsolete, or diluting its effect, so the
simple independent model doesn't work in general. Also, simply searching
through an increasing amount of knowledge may take increasing amounts
of computation. I played with the V=exp(W) assumption to weaken
it, and observed that the singularity remains if you assume processing
increases more slowly, for instance V = exp(sqrt(W)) or exp(W^1/4).
Only when V = exp(log(W)) (i.e, = W) does the progress curve subside
to an exponential.
Actually, the singularity appears somewhere in the I-would-have-expected
tame region between and V = W and V = W^2 (!)
Unfortunately the transitional territory between the merely exponential
V=W and the singularity-causing V=W^2 is analytically hard to deal
with. I assume just before a singularity appears, you get non-computably
rapid growth!
Your assumption:
N = C4^(C5*t) (the number of computing devices is growing at
its own exponential rate) is pretty arbitrary. Wouldn't it make
more sense to have the number increase as some function of W? In
the latter case, the number of computers could simply be factored
into the V=f(W) equation (where my V means the total amount of computation
in the world, equivalent to your N*V).
Suppose computing power per computer simply grows linearly with
total world knowledge, but that the number of computers also grows
the same way, so that the total amount of computational power in
the world grows as the square of knowledge:
V = W*W
also dW/dt = V+1 as before
This solves to W = tan(t) and V = tan(t)^2, which has lots of singularities
(I like the one at t = pi/2).
I also question the application of negative time, and in particular
zero time (which provides for the singularity).
Oh, that's just a matter of labeling the axes, with the origin
chosen in that particular place to avoid unsightly constants. You
could shift it anywhere, if you replace the t in the formulas by
(t-t0). But I like it where it is.
If there is a singularity, it's kind of natural to divide time
into BS (the negative times before the singularity) and AS (the
strange times afterwards). (I do worry a little that in some of
my constant-free formulas, it is easy to find regions where W<0,
though they can mostly be shifted away.)
--Hans
_______
From: Hans Moravec
Date: October 18, 2003
To: Ray Kurzweil
Subject: Singularity equation, very simple
Well, that was vastly easier than expected. I was holding the wrong
end of the stick last time.
Express an exponent a bit above 1 as (n+1)/n
The equation
dx/dt = x^((n+1)/n) with initial condition x(0) = 1
can be solved in Maple
dsolve({ diff(x(t),t)=x(t)^((n+1)/n), x(0)=1 } ,x(t));
to produce the solution
x(t) = (n/(n-t))^n
This has a singularity at t=n.
We discussed exponents in the range 1 to 2. 1 resulted in an exponential,
I was surprised that merely setting it to 2 produced a singularity.
In the solution above, exponent 2 happens when n = 1, and gives
a singularity at t=1.
The exponent approaches 1 as n is made large. The singularity recedes
farther and farther into the future, but never actually goes away.
We reach an exponent of 1 (i.e. dx/dt = x) as n approaches infinity.
In that limit (n/(n-t))^n does indeed become e^t (the limit is the
classic definition of e^t), and the singularity vanishes to infinitely
far into the future.
But for any exponent even slightly greater than 1 there is a singularity,
just far away.
This supersedes the confusing previous discussion.
So I tried a functions sub-polynomial (i.e. a smidge) faster than
linear
dx/dt = x*(1+log(x)) -> x(t) = exp(exp(t)-1)
vastly fast double exponential, but without a singularity
But merely square the log, or the log factor
dx/dt = x*(1+log(x)^2) -> x(t) = exp(tan(t))
dx/dt = x*(1+log(x(t)))^2 -> x(t) = exp(t/(1-t))
and the singularity is back.
Generalizing:
dx/dt = x*(1+log(x(t)))^((n+1)/n) gives
x(t) = exp((n/(n-t))^n-1)
and again we have a singularity that recedes.
10/19/03, Ray Kurzweil replied:
Thanks, Hans.
I appended your follow up message so that the thread will be complete.
This is what I suspected—that exponents over 1 would produce
a singularity.
Now the question is what IS the exponent. We can approach this
on either theoretical grounds, or from the data, and hopefully the
two approaches will corroborate one another.
Certain linear increments to k (knowledge) cause multiplicative
improvements in the performance of a technology. For example, an
increment to knowledge that allows us to reduce semiconductor feature
widths by a fraction, or a pipelining microprocessor architecture
that allows a certain fraction of instructions to be conducted in
parallel, result in improving performance by a multiple.
HOWEVER, technology performance is not the same as knowledge. The
next question is how does an improvement in the performance of computation
(and other information processing technologies such as communication
bandwidths) affect knowledge. There are many instances in which
it requires exponential gains in information processing to achieve
linear gains in knowledge. For example, it requires exponential
gains in computation to see linearly further ahead in chess moves.
Indeed, with exponential gains in computation, we've seen linear
gains in machine chess ratings. We've also seen linear gains in
pattern recognition performance (e.g., speech recognition accuracy)
from exponential gains in processor speed, memory capacity, as well
as the size of the research databases (of speech samples to train
on). There are many examples of inherently exponential problems
for which exponential grains in the power of the information processing
technology produce linear gains in performance and/or knowledge.
So in these cases, linear increments to k produce exponential grains
in information processing which in turn produce linear gains to
k.
My sense is that it is difficult to justify on theoretical grounds
having the rate of increase of knowledge be equal to the "size"
of knowledge raised to a power greater than 1.
However, I do think it is feasible to justify separate components
of growth, some (or one) of which have an exponent of one, and some
(or one) of which have a log exponent. The analysis above points
to a log exponent. I believe we can justify that the "value"
of a typical network or of a database does not expand proportional
to its size, but to the log of its size. For example, the value
of the Internet to me does not double if the number of users double.
Rather, its value to me increases in a logarithmic fashion. The
exponent of 1 comes from the number of users themselves. Having
twice as many users means it is serving twice as many people. So
the overall value of the network = n * log n (n people served times
the value of the network to each person = log n). This varies from
Metcalfe's formula that the value of a network = n ^ 2.
Having the exponent be 1 + log(x) (or rather 1 + log (k) since
we are talking about "knowledge") results in double exponential
growth. You described this as "vastly fast," but the speed
would depend entirely on the constants, which we've been leaving
out here for simplicity sake.
What does the data show? Both your graph and mine show double exponential
growth. On the other hand, we don't have enough data points to rule
out the possibility of an exponent greater than 1, so the data does
not definitively settle the issue.
I'm going to work on this some more, including trying to determine
values for the unstated constants, fitting the available data to
these formulas. If the "greater than exponential" performance
of the data results from an exponent greater than 1, what is the
exponent and when does it blow up to a singularity?
My belief is that we can justify a double exponential. The data
shows this, and I believe that a careful consideration of the theoretical
basis for the formulas will justify this as well. That, of course,
is profound enough. Double exponential growth still results int
he conclusion that our technology will be many orders of magnitude
more powerful than our biological intelligence within a small number
of decades.
It is hard to explain how we could get infinite knowledge, or infinite
information processing, from a finite world. Then again, maybe the
world isn't finite. If every particle in our universe were another
universe, and if every particle in those tiny universes were universes,
etc., then infinite knowledge and computational capacity could be
feasible.
Ray
10/19/03 Hans Moravec replied:
Both the polynomial and log polynomial curves with large n are
essentially indistinguishable from the singularity-free limiting
cases as long as t<<n.
Only as t nears n do they begin to dramatically differ.
If the universe is really finite in resolution and extent, then
it has only a finite number of distinct states, so can never host
a truly infinite process (and must eventually return endlessly to
the same states).
On the other hand, if the storage space can be extended indefinitely,
as with a Turing machine tape, a computation could make deductive
discoveries endlessly.
But I think Vernor doesn't suggest an actual infinity, just that
when AI takes over in its own further advancement, each doubling
in the rate at which AIs think will result in a doubling in the
rate of the advancement (rather than today, where the constant rate
of human thought almost fixes the rate of advancement). Ungoverned
AI progress will resemble a singularity as far as human scope and
time scale are concerned, and result in a new regime with properties
that are not simply an extrapolation of what happened before the
"singularity".
I think trying to time the appearance of runaway human-free (or
human time-scale-free) AI may be an interesting exercise.
All the existing human-paced curves will surely become obsolete
when that happens, even if there is some acceleration today from
computer tools. It still takes hours, days and years for new ideas
to penetrate human skulls—that changes when accelerating AI
runs the show.
10/19/03, Ray Kurzweil replied:
Hans,
You write:
"Both the polynomial and log polynomial curves with
large n are essentially indistinguishable from the singularity-free
limiting cases as long as t<<n. Only as t nears n do
they begin to dramatically differ."
RAY: Yes, so we cannot distinguish between these from the historical
data. Either case is consistent with the data we can observe to
date. I believe that a theoretical derivation of a formula that
makes sense would support the rate of change of k = k ^ 1 + log
(k).
You write:
"If the universe is really finite in resolution and
extent, then it has only a finite number of distinct states,
so can never host a truly infinite process (and must eventually
return endlessly to the same states).
On the other hand, if the storage space can be extended
indefinitely, as with a Turing machine tape, a computation
could make deductive discoveries endlessly."
RAY: Agreed.
You write:
"But I think Vernor doesn't suggest an actual infinity,
just that when AI takes over in its own further advancement,
each doubling in the rate at which AIs think will result
in a doubling in the rate of the advancement (rather
than today, where the constant rate of human thought almost
fixes the rate of advancement). Ungoverned AI progress will
resemble a singularity as far as human scope and time scale
are concerned, and result in a new regime with properties
that are not simply an extrapolation of what happened
before the "singularity"."
RAY: Yes, I understand that, and I use the term "singularity"
in this sense, that is a "rupture" in the fabric of human
history caused by the cycle of strong AI (nonbiological intelligence)
having access to its own source code and improving it.
Our dialogue on the possibility of a true mathematical singularity
is somewhat of a side issue, although an interesting one. However,
I am not definining the technological "singularity" as
a mathematical singularity, but (like Vinge) using the word as a
metaphor for an event with an event horizon beyond which it is difficult
to penetrate (analogously with a black hole).
I see this eventuality as more of a continuity with biological
intelligence. In other words, I see a gradual shift from the dominance
of biological intelligence to nonbiological. Our biological intelligence
is already greatly enhanced by our tools. We are reverse engineering
the human brain, and will thereby gain access to our own source
code. These insights—combined with our traditional mode of
AI experimentation that is not dependent on emulating the principles
of operation of the human brain—will enable us to create strong
AI as well as improve our own cognition (using noninvasive, nanobot-based,
widely distributed, neural implants).
You point out that any improvement of our own cognition is a very
temporary situation since the nonbiological portion of our intelligence
will within a brief period of time greatly outstrip the biological
portion.
I agree with this, but point out further that the nonbiological
intelligence we are creating is an expression of our human civilization.
I don't consider "human" to imply "biological."
Our species is specifically that species that seeks to improve its
own condition and to extend beyond its limitations. Note that the
subtitle to my upcoming book "The Singularity is Near"
is "When Humans Transcend Biology."
You wrote:
"I think trying to time the appearance of runaway
human-free (or human time-scale-free) AI may be an interesting
exercise."
RAY: Yes, that is a major focus of this upcoming book. I believe
we will demonstrate strong AI (passing the Turing test) by 2029,
but that does not immediately mean hyper development speeds. Passing
the turing test simply means equaling human intelligence. Human
intelligence, after all, is not necessarily all that impressive.
If you gathered 100 random people from your local shopping mall,
you have 100 entities that can presumably pass a Turing test, but
they would not be very effective at improving human intelligence
even if they did have access to their own source code.
I see these trends consolidating in the 2030s and that by the 2040s
we will see changes that cannot be followed by unenhanced human
intelligence. That is what I regard as the Singularity and I place
it in that decade.
You write:
"All the existing human-paced curves will surely become
obsolete when that happens, even if there is some acceleration
today from computer tools. It still takes hours, days and
years for new ideas to penetrate human skulls—that changes
when accelerating AI runs the show."
RAY: No, I disagree. If you look at my curve (which corresponds
closely to what you have derived) for the 2040s and beyond, you
see rates of change that are so fast that they could not possibly
be understoood, let alone governed, by unenhanced human biological
intelligence. Indeed, this a point (mistakenly) made by some critics—that
these curves cannot survive to the 2030s and 2040s because it would
be impossible for humans to make changes that fast.
So the take over of continued accelerating technological progress
and exponential growth of knowledge by nonbiological intelligence
is necessary for this to happen. Conversely, these curves are feasible
because, and only because, of the impending dominance by nonbiological
intelligence (strong AI). So it is not the case that our current
curves become obsolete. Rather, the "singularity" is what
enables these curves to continue.
Ray
10/19/03 Hans Replied:
> a formula that makes sense would support
> the rate of change of k = k ^ 1 + log (k).
You mean dk/dt = k ^ (1 + log (k))
dsolve({ diff(x(t),t) = x(t) ^ (1+log(x(t))), x(0)=1 } ,x(t));
x(t) = exp(RootOf(erf(_Z)*Pi-2*sqrt(Pi)*t))
(RootOf means the solution for _Z of erf(_Z)*Pi-2*sqrt(Pi)*t =
0, or erf(_Z) = 2*t/sqrt(Pi).
erf has a range between and asymptotic to -1 and 1. Thus the solution
has a singularity when 2*t/sqrt(Pi) = 1, i.e at t = sqrt(Pi)/2 )
Thus this formula does have a near singularity. Not surprising
since log(k) > 0 so the original formula has an exponent of k
>1.
If you really meant dk/dt = k + log(k) then, of course the formula
is essentially linear (<2k) and the solution is between t = exp(k)
and t = exp(2*k) (actual solution is a complicated RootOf)
10/19/03 Ray Replied:
I don't mean: dk/dt = k ^ (1 + log (k)) since (as you note) that
is an exponent greater than 1 which as we noted will blow up.
What I meant was: dk/dt = k*(1+log(k)) = k + k * log(k). This has
the linear portion plus a linear portion accelerated by the log
of knowledge. As I noted k * log(k) is a reasonable value function
for a network of k nodes. Sorry I meant to write * instead of ^
and had implied the parentheses.
10/19/03 Hans replied:
Tickling the tail of the singularity, I see
dsolve({diff(x(t),t)=x(t)*(1+log(x(t)))*(1+log(1+log(x(t)))),x(0)=1},x(t));
gets
x(t) = exp(exp(exp(t)-1)-1)
raising the log log term to (n+1)/n power brings in a singularity
at t=n as before. So instead layer on another log in the next term
Let lp(x) = 1+log(x)
dsolve({diff(x(t),t)=x(t)*lp(x(t))*lp(lp(x(t)))*lp(lp(lp(x(t)))),x(0)=1},x(t));
-> x(t) = exp(exp(exp(exp(t)-1)-1)-1)
Seems to be a pattern here.
Makes me wonder what the limit of
x*lp(x)*lp^2(x)*lp^3(x)*lp^4(x)... might be
a function still barely faster than linear, yet only infinitesimally
away from triggering a singularity.
|