Synthespianism
The elusive goal of creating a photorealistic synthespian indistinguishable from a live actor has intrigued, taunted and tormented programmers for 30 years. We're 90 percent there, but efforts so far have failed to convey convincing nuances of facial expression, micro-motion, light, and texture. The exciting areas to be explored are those where the animator becomes analogous to the actor in a medium free of the constraints of live action photography...creating characters, roles, and plots that exploit our intimate familiarity with the human form and its subtleties, but don't attempt to recreate photorealistic renderings of it.
The anthropomorphization of computer graphics has been a classic
case of exponential growth powered by technology, art, commerce
and culture. Funding for military and aerospace applications like
nuclear weapons design, weather prediction and flight simulation
paid for much of the initial heavy lifting required to build the
foundation of the computer graphics industry during the 1960's and
early 1970's.
As the sophistication of graphics software marched forward and
the cost of computing slid downward, the annual SIGGRAPH film and
video show became the crucible in which technologists, filmmakers
and artists were introduced to one another: the baton of computer
graphics design was passed from those who wrote the programs required
to create imagery to those who perceived and exploited the incredible
communicative potential of this fledgling medium.
Along the way, the natural predilection to create computer graphics
in the image of ourselves has led to a striking body of creative
endeavor, ever growing in realism and resolution, which now knocks
at the door of imperceptibility; that is, the rendering of human
performances indistinguishable from flesh and blood performances.
Some of the steps in that evolution are traced here from a personal
perspective, and some speculations on future developments are presented.
Initial attempts at simulating human body motion took several forms
at Digital Effects, a company I co-founded in 1978 along with college
associates. Hierarchical skeletons were created and keyframe-animated
to move in a bipedal fashion, but without any IK (inverse kinematics)
solutions available, the results were stilted and difficult to edit.
Rotoscoping live-action footage of a subject festooned with witness
points at the joints and digitizing the positing of the points on
film allowed us to imbue our characters with more lifelike motion,
but the process yielded primarily 2-D information that could not
be used from all angles.
At that time, 3-D modeling software had been designed for architectural
applications and was not capable of modeling the human form in a
satisfactory manner. Early attempts at creating and linking facial
expressions using software solutions in computer animation were
very disappointing and unconvincing at even the most basic levels
of lip synchronization.
3-D rotoscoping at Robert Abel and Associates yielded the first
motion-captured animation in the commercial project, Sexy Robot,
under the technical direction of Frank Vitz, who used two 35mm cameras
to triangulate the 3-D positions of witness points on a live subject.
During the period of 1985-1986, a good deal of seminal research
and development in character animation was conducted at Abel as
well as Digital Productions and Omnibus Computer Graphics, all three
of which joined together and imploded under the weight of their
largesse.
First Synthespian
While at Digital-Omnibus-Abel (to become known as DOA) I met Diana
Walczak, a recent college graduate who was searching for ways to
combine science and art. We formed a partnership based on our mutual
interest in developing computer-generated characters and came up
with sculpture-based solutions to the problem of modeling the human
form and creating facial animation.
Diana sculpted a human form in clay and metal armature that was
cast in hydrocal, from which individual body parts could be created
and digitized using a magnetic digitizing device called the 3Space
Digitizer by Polhemus. The body parts were lined with thin tape
to define the optimum topology in polygons and digitized by hand
using a magnetic sensor.
These
body parts were then assembled digitally into a skeletal hierarchy
to form Nestor Sextone, our first Synthespian. Nestor's joints were
formed by interpenetrating solids-due to the fact that software
did not yet exist that would allow for flexors at the joints-which
gave us seams similar to that of a plastic action figure. For facial
animation, a neutral face was cast in hydrocal, which allowed Diana
to make multiple clay copies that could be sculpted into various
phonemes of speech and facial expressions.
Larry Weinberg, a programmer from Digital Effects and Omnibus who
later would write Poser, contributed software that would allow us
to link the various digitized facial expressions together by re-ordering
the polygons. With multiple faces re-ordered into exactly the same
polygonal topology, we could interpolate from one to another, enabling
us to create scripts that could simulate lip synchronization with
our soundtracks. Using keyframe animation and Larry's facial animation
software, Sextone made his screen debut for SIGGRAPH 1988 in a film
of 30 seconds duration in which he campaigns for the presidency
of the Synthetic Actors Guild.
Intrigued
by the potential of motion capture to link natural human motion
to our synthetic characters, we created Don't Touch Me, a
music video piece that premiered at SIGGRAPH 1990 in which singer/songwriter
Perla Batalla was optically motion captured (by Motion Analysis
in Santa Rosa, CA) to drive a singing synthespian called Dozo. By
this time, we had suitable flexing software for simple joints like
elbows and knees, but the multi-axis requirements for the shoulder
joint meant a solution was still several months of development away.
Facial animation was again created by linking digitized sculptures
of various facial expressions and this technique yielded superior
results to any of the software solutions of the time that sought
to model the musculature of the face. Software solutions would require
many years of development before they would overcome the quality
and believability of the sculpture-based technique, which allowed
for the preservation of facial volume and the illusion of the preservation
of facial muscle integrity during motion.
|
This photo shows several of the masks that were sculpted
by Diana Walczak to animate the faces of Nestor Sextone and
Dozo.
Walczak first created a clay sculpture with a neutral
expression and produced a mold from which more identical clay
faces were fabricated. Sextone and Dozo each required 15 individual
clay face sculptures or masks—each formed by Walczak
to convey a unique expression or phoneme of speech.
Each mask was lined with thin tape and then the intersecting
points of the tape were digitized in order to create a CG
model from each mask. The fifteen CG models created for each
face were divided to separate the tops from the bottoms so
that they could be duplicated, mixed and matched to create
45 expressions for each character.
|
This is the result of the fact that Diana's keyframe sculptures
all had appropriate muscle definition and maintained that definition
while interpolating from one to the next.
Our
first stereoscopic synthespians were created for In Search of
the Obelisk, a theme park trilogy for the Luxor Hotel in Las
Vegas, designed by Doug Trumbull. Using optical motion capture of
live dancers, we created the illusion of glass synthespians dancing
on a hovering beach that floated over the audience.
Since we used ray tracing to refract the background through the
bodies of the dancers, the stereoscopic image perceived by the audience
was accurately rendered with slightly different refractions from
the point of view of the left eye as compared to the right, yielding
a very realistic illusion reminiscent of the optical properties
of a glass object when viewed stereoscopically.
For
the feature film Judge Dredd, digital stunt doubles were
created to solve a technical problem: many of the shots in the climactic
chase sequence required Sylvester Stallone and Rob Schneider to
appear to ride on a flying motorcycle that weaves around other flying
vehicles and skyscrapers. The close-ups were shot on a green screen
stage with a gimbaled prop of the motorcycle and composited into
mocon (motion control) footage of the huge model of the city. Other
shots required the motorcycle to fly toward camera from a long distance
and maneuver in a complex flight path as it whizzed past camera.
These shots were not able to be photographed due to limitations
in the length of the green-screen camera rig as well as the reluctance
of the producers to allow a large, heavy, motion-controlled camera
rig to careen within a few feet of their lead actors. We used magnetic
motion capture (Ascension Technologies' Flock of Birds) to obtain
the body dynamics of the motorcycle riders during the various changes
in attitude of the motorcycle.
|
In this image, computer-generated cycles with Synthespian
riders were composited with environments shot with a motion-control
camera. One of the principal challenges of this production was
the seamless and convincing fusion of computer-generated and
physical models. |
Playing back the previsualization on video, the subject (in this
case Diana Walczak) was affixed with magnetic trackers and was wobbled
around on a gimbaled motorcycle mockup in sync with the previz playback.
The way her body moved in response to the motion of the bike was
captured and applied to photoreal synthetic versions of Stallone
and Schneider. For the faces, we used CyberScans for the first time
and the results were satisfactory since the camera never lingered
on the faces at close range.
Organic shape-shifting
A later project for the feature film X-Men involved the character
Mystique, played by Rebecca Romijn-Stamos. Mystique is a shape-shifter
who transforms from her scaly blue form into other characters and
back again using a combination of live action photography and CG animation.
Director Bryan Singer was looking for a transformation that would
stand apart from the typical morphing that had risen like a plague
through the visual arts, becoming a constant technique used in advertising
and in films to change one object into another using simultaneous
2D shape transformation with dissolving texture vertices.
We designed a technique that would allow for a dimensional transformation
that would begin at various locations and spread across and around
the limbs in an organic, infectious fashion accented by 3-D scales
bursting through the surface and settling down like a shaking dog's
coat to form the scales on her body. In most cases we used CyberScan
data of the outgoing actor matched to the 3-D position of the actor
in the shot as a matting element to transform into an all-CG Mystique.
This technique required eighteen stages of production to create
the multilayered complex transformation and very careful matching
of CG skin, clothing and hair to the live action footage. Although
the mandate was to make the CG Mystique appear photoreal, her blue,
scaly body was very different from that of a normal person, yielding
considerable visual leeway
For the Revolution Films production of the Jet Li film, The
One, Jet Li battles his identical doppelganger from another
dimension. For many shots, a simple split screen or a Patty Duke-style
over- the-shoulder shot would suffice, but for high-speed kung fu
battle sequences in which punches and kicks had to land and be felt
by the audience, digital face replacement was the technique of choice.
The separation of a facial performance from a physical performance
had been accomplished before in Jurassic Park in the shot
where the velociraptor leaps up from below to attack a child character.
The adult stunt double's face was replaced with that of the child
actor, but that was simply a composite of photographic elements.
In The One, the complex high-speed motion of the subjects
during the fight sequences-coupled with the requirement that the
two subjects sometimes appear to move at different camera frame
rates-required us to develop a fully CG face-replacement solution.
The stunt double, a kung fu expert with a very similar body type
to Jet Li, was outfitted with a plastic mask that was milled from
a CyberScan of Jet Li's face. The mask was equipped with retro-reflective
witness points and the camera was outfitted with a fluorescent,
circular light around the lens to ensure that the markers would
show up on film.
|
Jet Li plays a police officer pursued by his evil alter
ego from a parallel universe who seeks to kill him and become
The One. Advanced face replacement techniques allow Li
to fight his twin. Both faces are visible and fully expressive
in close-ups. |
The fight sequences were choreographed so that the force of the
impacts would impart proper reaction in the two participants. Using
the known positions of the facemask markers, we could determine
the precise orientation of the stunt double's face on each frame,
allowing us to track a CG face over top of his mask.
Using CyberScan data of Jet's face, along with high-resolution
photographs, we created and rigged a detailed 3-D Jet Li face with
blendshapes that would allow us to simulate different facial expressions
during the fight. The CG face was then animated to give us the appropriate
expression for each sequence, matted into the shot covering up the
mask and blended into the stunt double's natural color around the
face. Because it was not possible to photograph Li's face in the
proper dynamic orientation with the proper expression for a given
moment of a fight, a CG face was the only solution.
The resulting technology, which allows us to separate the physical
performance from the facial performance, has far-reaching implications
for the future of filmmaking. First of all, stunt sequences that
normally would be staged in such a way that the face of the stunt
double is never facing camera can now be staged according to the
needs of the director, and the actor's face can be inserted accurately
and believably. More broadly, the facial performance of an actor
who is incapable of the physical aspects of a performance can be
composited into the footage of a stunt double to multiply the range
of an actor's possible roles. Recent projects making use of our
technology include inserting an actor's face onto stunt doubles
who are surfing and riding motorcycles.
Animation trumps mo-cap
More interesting from our standpoint is the creation of wholly
CG characters and their application to entertainment projects. Universal
Studios came to us with the mandate to create the best theme park
attraction in the world based on the Spider-Man characters, and
we spent three years in production on The Amazing Adventures
of Spider-Man, a multimedia, stereoscopic, moving motion-simulator
attraction that was to become the flagship of their new theme park,
Islands of Adventure in Orlando, Florida.
|
The Amazing Adventures of Spider-Man was created
for Universal's billion-dollar Islands of Adventure theme
park in Orlando. It's the first ride in history to combine stereoscopic
3D film projected onto giant screens with the latest in motion-based
vehicle technology. This virtual-reality adventure immerses
riders in a comic-book battle between Spider-Man and members
of the sinister syndicate as riders move through a 1.5-acre
set environment. |
Working with our head software designer, Frank Vitz, we developed
software that would compensate for the viewing position of the moving
audience, who would sit in six degrees-of-freedom motion simulators
traveling on a track past 13 large reflective screens. The imagery
was projected in stereoscopic eight-perf 70mm film.
A great deal of attention was paid to matching the physical sets
in the ride to the imagery projected onto the screens so that the
lines were blurred between the real world and the virtual, projected
world. In fact, many of the sets adjacent to the screens were dressed
with CG textures that originated from our virtual sets and were
scanned onto eight-foot-wide canvas murals so that imagery and sight
lines would match up and blend the two worlds into one.
From a design standpoint, our goal was to take the audience into
a comic book world that combines the hard key-lighting and saturated-color
style of comic art with enough textural detail to feel like a real
place. It was a balancing act between stylization and realism that
resulted in a unique and exciting environment in which to stage
the epic struggle between Spider-Man and a gaggle of super-villains
led by Dr. Octopus, one that swirls around the audience whom Spider-Man
must protect.
We tested and abandoned motion capture for the project based on
the fact that the superhuman performances of the Marvel characters
could be better realized by talented animators using keyframe techniques
rather than by animators trying to extend the physical range of
motion-captured athletes.
Our first totally original synthespian project was made possible
by Busch Entertainment, who gave us virtual "carte blanche"
to design a ride from the ground up for a new area at Busch Gardens
in Williamsburg, VA. With only one word of direction, "Ireland,"
from the client, we wrote a story called Corkscrew Hill that
would exploit the physical parameters available: two 60-person Reflectone
motion bases in two identical warehouse spaces.
|
The Corkscrew Hill computer-animated stereoscopic
epic ride experience takes audiences on an adventure to Old
Ireland, populated with humans and mythical creatures. In the
pre-show, the audience shrinks to fit in a magic box. Then they
enter a motion base and are strapped into their seats for the
main show: one continuous-point-of-view shot from the box as
characters carry it on a wild adventure on Corkscrew Hill.
SensAble's FreeForm System was used to sculpt character
heads. Pieces of character models were joined with Paraform
software. Maya was used for modeling, animation, and rendering.
Large-format digital projection was engineered by Electrosonic.
|
We specified very large reflective screens and an
open cockpit design for the attraction and—working with the
audio-visual engineers at Electrosonic in Burbank, CA—we came
up with a digital projection system that would give us film resolution
on a large screen despite the fact that digital projectors were
currently not up to the task.
By
rotating four Barco DLP projectors 90 degrees and edge-blending
down the middle (using two projectors for each image), we could
get stereoscopic image pairs onto the 30 x 40 foot screens at 2048
horizontal by 1280 vertical resolution. Since the brain fuses the
left and right images into a single mental image, any image artifacts
from the projection were lost in the mental blending process, resulting
in excellent stereoscopic imagery. We choreographed a camera move
that takes us on an adventure through ancient Ireland, encountering
Irish townspeople, a magic flying horse, banshees, a troll, a witch
and a griffin.
This
eight-minute attraction allowed us to create a completely synthetic
world and populate it with mythical creatures and characters with
a visual style akin to that of a storybook. Again, we opted for
keyframed character animation instead of motion capture, which often
seems pedestrian when applied to CG characters. When keyframing,
an animator enters into and becomes the character, breathing original
life into it that cannot be obtained through motion capture, which
is in effect the three-dimensional "xeroxing" of a physical
performance.
In the same way that a caricature of a person looks more like the
subject than would a tracing off a photograph, or a good sculpture
of a person looks more like them than a life cast, a stylized CG
character created by a talented keyframe animator looks more believable
and lifelike than one created with motion capture and CyberScans.
The limits of photorealism
Looking to the future, one must examine one's goals in creating
CG life forms. There are those who hold up photorealism as the ultimate
goal: to create a synthespian indistinguishable from a live actor.
This idea has intrigued, taunted and tormented programmers for 30
years, going back to the films Westworld in 1973 and Looker
in 1981. The broad base of development required to accomplish this
feat has been gaining momentum at an exponential rate as more applications,
competition and funding enter the arena.
There exists a trade-off between what level of realism is possible
versus how much computing time can be spent on each frame. We supplied
very efficient body databases to Ray Kurzweil's Ramona
project, which was presented at the Technology Entertainment Design
(TED) conference in February 2001. This real-time performance took
advantage of recent developments in hardware rendering that allowed
a fairly sophisticated human figure to be rendered and displayed
at 30 frames per second. Through the use of real-time motion capture
and voice synthesis, Ray was able to inhabit his female alter ego,
Ramona.
The performance was designed within the limitations of the technology,
in that the "camera" did not venture too close to Ramona's
face, where the "efficiency" of our data would become
a liability in terms of image quality. As the camera approaches
the subject, the resolution requirements skyrocket, and to render
a photorealistic close-up on film requires orders of magnitude more
calculation than can be supported by real-time rendering engines.
As Ray points out, computing speeds are increasing at exponential
rates, but current technology still gets slammed to the mat when
it is applied to creating a synthespian who appears real in every
detail. The problem is that we spend so much of our time studying
the nuances of facial expression in our colleagues, friends and
family, so we have become quite expert at spotting flaws. There
are many subtle details in a real face, including how the complex
muscle system perturbs the skin surface, how light scatters inside
the skin, and how surface pores, blemishes and other minute details
look and react to light.
A spectacular amount of money was poured into solving these problems
in the all-CG feature film Final Fantasy: the Spirits Within
and the results did not pay off at the box office. Many projects
have been proposed that would use CG characters to bring deceased
actors to the screen for a posthumous encore, but the technology
is not yet ready for this task, and many of us cringe at the prospect
of this sort of application. The recent release of S1m0ne
reiterates the basic problem these sorts of projects face: we can
get about 90 percent of the way to photorealism in CG actors, but
the last ten percent is extremely expensive and time-consuming in
comparison to photographing real actors.
|
Characters from Sony Pictures' CG feature film Final
Fantasy |
In Final Fantasy, the hair, skin, cloth dynamics and lighting
are all in that 95% range that just doesn't make it to photoreality,
except in stills. In motion, the illusion lacks the subtlety of
micro-motion and micro-detail of live action photography, and the
results are unsettling and distracting from the storyline.
In Simone, the producers opted to use a live actor who was
digitally altered to be just slightly idealized through image processing.
Coupled with a few shots of a CyberScanned 3-D model being revealed
like an orange peel wipe-on, the processed footage told the story
adequately and carried the story point of a believable CG human
cost-effectively.
To
have used a CG character throughout would have been many times more
costly and it is unlikely that the audience would believe that the
CG character could be mistaken for a real actor. Albeit a valiant
attempt, the film presumes a mythical world where Hollywood producers
and the general public have no knowledge of the history and progress
of visual effects, computer animation and digital compositing.
Animator as actor
The exciting areas to be explored are those where the animator becomes
analogous to the actor. By animators using a robust set of tools and
techniques, performances of the quality and richness currently created
by the finest actors will be made possible in a medium free of the
constraints of live-action photography. These will be characters,
roles, and plots that exploit our intimate familiarity with the human
form and its subtleties, but don't attempt to recreate photorealistic
renderings of it.
When painters developed the skills to recreate realistic images,
a golden era of realism followed. But when photography came along
and replaced the role of the painter as visual documentarian, painters
responded with expressionism and abstraction, modes of image-making
only possible in the era of post-realism.
In the same way, after the CG industry is able to reproduce reality
in its most intricate detail, the next step will be to build upon
that foundation a new and exciting future of non-realistic style.
But rather than being limited to the confines of a painted canvas
or a physical sculpture, the realm of imagination becomes the only
outer limit.
Beyond the capability to achieve photorealism, there is a much
more compelling goal of creating entertainment that takes place
beyond what and where we can photograph. The writer and director
are now the creative overlords, equipped with unlimited theatrical
possibilities in terms of locations, characters, storylines and
visual style. The entire world of science fiction and fantasy-based
literature can be shot "on location" without limitation.
New stories heretofore inconceivable will be created and brought
to the world of the visual arts and entertainment.
In this work, the emphasis will be on the concept. The writer and
director stand at the door of a new space that has thus far been
explored by precious few—and marvel at the possibilities.
Sextone for President Written and Directed by
Jeff Kleiser and Diana Walczak © 1988 Kleiser-Walczak
Don't Touch Me Directed by Diana Walczak and
Jeff Kleiser © 1989 Kleiser-Walczak
In Search of the Obelisk Computer Animation by
Kleiser-Walczak. Produced by The Trumbull Company for Circus Circus
Enterprises, Inc.
X-Men © 2000 Twentieth Century Fox. All
rights reserved. Image courtesy Kleiser-Walczak
The One © 2001 Revolution Studios Distribution
Company, LLC. Property of Sony Pictures Entertainment, Inc. ©
2001 Columbia Pictures Industries, Inc. All rights reserved. Image
courtesy Kleiser-Walczak.
The Amazing Adventures of Spider-Man © 1999
Universal Studios Escape. A Universal Studios/Rank Group Joint Venture.
All rights reserved. Image courtesy Kleiser-Walczak.
Corkscrew Hill Original ride film Written and
Directed by Jeff Kleiser and Diana Walczak © 2001 Busch Entertainment
Corporation. All rights reserved. Image courtesy Kleiser-Walczak.
Final Fantasy Property of Sony Pictures Entertainment,
Inc. © 2001 Columbia Pictures Industries, Inc. All rights reserved.
S1M0NE © 2002 Darren Michaels/New Line Productions
|