Origin > Virtual Realities > Synthespianism
Permanent link to this article: http://www.kurzweilai.net/meme/frame.html?main=/articles/art0526.html

Printable Version
by   Jeff Kleiser

The elusive goal of creating a photorealistic synthespian indistinguishable from a live actor has intrigued, taunted and tormented programmers for 30 years. We're 90 percent there, but efforts so far have failed to convey convincing nuances of facial expression, micro-motion, light, and texture. The exciting areas to be explored are those where the animator becomes analogous to the actor in a medium free of the constraints of live action photography...creating characters, roles, and plots that exploit our intimate familiarity with the human form and its subtleties, but don't attempt to recreate photorealistic renderings of it.

The anthropomorphization of computer graphics has been a classic case of exponential growth powered by technology, art, commerce and culture. Funding for military and aerospace applications like nuclear weapons design, weather prediction and flight simulation paid for much of the initial heavy lifting required to build the foundation of the computer graphics industry during the 1960's and early 1970's.

As the sophistication of graphics software marched forward and the cost of computing slid downward, the annual SIGGRAPH film and video show became the crucible in which technologists, filmmakers and artists were introduced to one another: the baton of computer graphics design was passed from those who wrote the programs required to create imagery to those who perceived and exploited the incredible communicative potential of this fledgling medium.

Along the way, the natural predilection to create computer graphics in the image of ourselves has led to a striking body of creative endeavor, ever growing in realism and resolution, which now knocks at the door of imperceptibility; that is, the rendering of human performances indistinguishable from flesh and blood performances. Some of the steps in that evolution are traced here from a personal perspective, and some speculations on future developments are presented.

Initial attempts at simulating human body motion took several forms at Digital Effects, a company I co-founded in 1978 along with college associates. Hierarchical skeletons were created and keyframe-animated to move in a bipedal fashion, but without any IK (inverse kinematics) solutions available, the results were stilted and difficult to edit. Rotoscoping live-action footage of a subject festooned with witness points at the joints and digitizing the positing of the points on film allowed us to imbue our characters with more lifelike motion, but the process yielded primarily 2-D information that could not be used from all angles.

At that time, 3-D modeling software had been designed for architectural applications and was not capable of modeling the human form in a satisfactory manner. Early attempts at creating and linking facial expressions using software solutions in computer animation were very disappointing and unconvincing at even the most basic levels of lip synchronization.

3-D rotoscoping at Robert Abel and Associates yielded the first motion-captured animation in the commercial project, Sexy Robot, under the technical direction of Frank Vitz, who used two 35mm cameras to triangulate the 3-D positions of witness points on a live subject.

During the period of 1985-1986, a good deal of seminal research and development in character animation was conducted at Abel as well as Digital Productions and Omnibus Computer Graphics, all three of which joined together and imploded under the weight of their largesse.

First Synthespian

While at Digital-Omnibus-Abel (to become known as DOA) I met Diana Walczak, a recent college graduate who was searching for ways to combine science and art. We formed a partnership based on our mutual interest in developing computer-generated characters and came up with sculpture-based solutions to the problem of modeling the human form and creating facial animation.

Diana sculpted a human form in clay and metal armature that was cast in hydrocal, from which individual body parts could be created and digitized using a magnetic digitizing device called the 3Space Digitizer by Polhemus. The body parts were lined with thin tape to define the optimum topology in polygons and digitized by hand using a magnetic sensor.

These body parts were then assembled digitally into a skeletal hierarchy to form Nestor Sextone, our first Synthespian. Nestor's joints were formed by interpenetrating solids-due to the fact that software did not yet exist that would allow for flexors at the joints-which gave us seams similar to that of a plastic action figure. For facial animation, a neutral face was cast in hydrocal, which allowed Diana to make multiple clay copies that could be sculpted into various phonemes of speech and facial expressions.

Larry Weinberg, a programmer from Digital Effects and Omnibus who later would write Poser, contributed software that would allow us to link the various digitized facial expressions together by re-ordering the polygons. With multiple faces re-ordered into exactly the same polygonal topology, we could interpolate from one to another, enabling us to create scripts that could simulate lip synchronization with our soundtracks. Using keyframe animation and Larry's facial animation software, Sextone made his screen debut for SIGGRAPH 1988 in a film of 30 seconds duration in which he campaigns for the presidency of the Synthetic Actors Guild.

Intrigued by the potential of motion capture to link natural human motion to our synthetic characters, we created Don't Touch Me, a music video piece that premiered at SIGGRAPH 1990 in which singer/songwriter Perla Batalla was optically motion captured (by Motion Analysis in Santa Rosa, CA) to drive a singing synthespian called Dozo. By this time, we had suitable flexing software for simple joints like elbows and knees, but the multi-axis requirements for the shoulder joint meant a solution was still several months of development away.

Facial animation was again created by linking digitized sculptures of various facial expressions and this technique yielded superior results to any of the software solutions of the time that sought to model the musculature of the face. Software solutions would require many years of development before they would overcome the quality and believability of the sculpture-based technique, which allowed for the preservation of facial volume and the illusion of the preservation of facial muscle integrity during motion.

This photo shows several of the masks that were sculpted by Diana Walczak to animate the faces of Nestor Sextone and Dozo.

Walczak first created a clay sculpture with a neutral expression and produced a mold from which more identical clay faces were fabricated. Sextone and Dozo each required 15 individual clay face sculptures or masks—each formed by Walczak to convey a unique expression or phoneme of speech.

Each mask was lined with thin tape and then the intersecting points of the tape were digitized in order to create a CG model from each mask. The fifteen CG models created for each face were divided to separate the tops from the bottoms so that they could be duplicated, mixed and matched to create 45 expressions for each character.

This is the result of the fact that Diana's keyframe sculptures all had appropriate muscle definition and maintained that definition while interpolating from one to the next.

Our first stereoscopic synthespians were created for In Search of the Obelisk, a theme park trilogy for the Luxor Hotel in Las Vegas, designed by Doug Trumbull. Using optical motion capture of live dancers, we created the illusion of glass synthespians dancing on a hovering beach that floated over the audience.

Since we used ray tracing to refract the background through the bodies of the dancers, the stereoscopic image perceived by the audience was accurately rendered with slightly different refractions from the point of view of the left eye as compared to the right, yielding a very realistic illusion reminiscent of the optical properties of a glass object when viewed stereoscopically.

For the feature film Judge Dredd, digital stunt doubles were created to solve a technical problem: many of the shots in the climactic chase sequence required Sylvester Stallone and Rob Schneider to appear to ride on a flying motorcycle that weaves around other flying vehicles and skyscrapers. The close-ups were shot on a green screen stage with a gimbaled prop of the motorcycle and composited into mocon (motion control) footage of the huge model of the city. Other shots required the motorcycle to fly toward camera from a long distance and maneuver in a complex flight path as it whizzed past camera.

These shots were not able to be photographed due to limitations in the length of the green-screen camera rig as well as the reluctance of the producers to allow a large, heavy, motion-controlled camera rig to careen within a few feet of their lead actors. We used magnetic motion capture (Ascension Technologies' Flock of Birds) to obtain the body dynamics of the motorcycle riders during the various changes in attitude of the motorcycle.

In this image, computer-generated cycles with Synthespian riders were composited with environments shot with a motion-control camera. One of the principal challenges of this production was the seamless and convincing fusion of computer-generated and physical models.

Playing back the previsualization on video, the subject (in this case Diana Walczak) was affixed with magnetic trackers and was wobbled around on a gimbaled motorcycle mockup in sync with the previz playback. The way her body moved in response to the motion of the bike was captured and applied to photoreal synthetic versions of Stallone and Schneider. For the faces, we used CyberScans for the first time and the results were satisfactory since the camera never lingered on the faces at close range.

Organic shape-shifting

A later project for the feature film X-Men involved the character Mystique, played by Rebecca Romijn-Stamos. Mystique is a shape-shifter who transforms from her scaly blue form into other characters and back again using a combination of live action photography and CG animation. Director Bryan Singer was looking for a transformation that would stand apart from the typical morphing that had risen like a plague through the visual arts, becoming a constant technique used in advertising and in films to change one object into another using simultaneous 2D shape transformation with dissolving texture vertices.

We designed a technique that would allow for a dimensional transformation that would begin at various locations and spread across and around the limbs in an organic, infectious fashion accented by 3-D scales bursting through the surface and settling down like a shaking dog's coat to form the scales on her body. In most cases we used CyberScan data of the outgoing actor matched to the 3-D position of the actor in the shot as a matting element to transform into an all-CG Mystique.

This technique required eighteen stages of production to create the multilayered complex transformation and very careful matching of CG skin, clothing and hair to the live action footage. Although the mandate was to make the CG Mystique appear photoreal, her blue, scaly body was very different from that of a normal person, yielding considerable visual leeway

For the Revolution Films production of the Jet Li film, The One, Jet Li battles his identical doppelganger from another dimension. For many shots, a simple split screen or a Patty Duke-style over- the-shoulder shot would suffice, but for high-speed kung fu battle sequences in which punches and kicks had to land and be felt by the audience, digital face replacement was the technique of choice.

The separation of a facial performance from a physical performance had been accomplished before in Jurassic Park in the shot where the velociraptor leaps up from below to attack a child character.

The adult stunt double's face was replaced with that of the child actor, but that was simply a composite of photographic elements. In The One, the complex high-speed motion of the subjects during the fight sequences-coupled with the requirement that the two subjects sometimes appear to move at different camera frame rates-required us to develop a fully CG face-replacement solution.

The stunt double, a kung fu expert with a very similar body type to Jet Li, was outfitted with a plastic mask that was milled from a CyberScan of Jet Li's face. The mask was equipped with retro-reflective witness points and the camera was outfitted with a fluorescent, circular light around the lens to ensure that the markers would show up on film.

Jet Li plays a police officer pursued by his evil alter ego from a parallel universe who seeks to kill him and become The One. Advanced face replacement techniques allow Li to fight his twin. Both faces are visible and fully expressive in close-ups.

The fight sequences were choreographed so that the force of the impacts would impart proper reaction in the two participants. Using the known positions of the facemask markers, we could determine the precise orientation of the stunt double's face on each frame, allowing us to track a CG face over top of his mask.

Using CyberScan data of Jet's face, along with high-resolution photographs, we created and rigged a detailed 3-D Jet Li face with blendshapes that would allow us to simulate different facial expressions during the fight. The CG face was then animated to give us the appropriate expression for each sequence, matted into the shot covering up the mask and blended into the stunt double's natural color around the face. Because it was not possible to photograph Li's face in the proper dynamic orientation with the proper expression for a given moment of a fight, a CG face was the only solution.

The resulting technology, which allows us to separate the physical performance from the facial performance, has far-reaching implications for the future of filmmaking. First of all, stunt sequences that normally would be staged in such a way that the face of the stunt double is never facing camera can now be staged according to the needs of the director, and the actor's face can be inserted accurately and believably. More broadly, the facial performance of an actor who is incapable of the physical aspects of a performance can be composited into the footage of a stunt double to multiply the range of an actor's possible roles. Recent projects making use of our technology include inserting an actor's face onto stunt doubles who are surfing and riding motorcycles.

Animation trumps mo-cap

More interesting from our standpoint is the creation of wholly CG characters and their application to entertainment projects. Universal Studios came to us with the mandate to create the best theme park attraction in the world based on the Spider-Man characters, and we spent three years in production on The Amazing Adventures of Spider-Man, a multimedia, stereoscopic, moving motion-simulator attraction that was to become the flagship of their new theme park, Islands of Adventure in Orlando, Florida.

The Amazing Adventures of Spider-Man was created for Universal's billion-dollar Islands of Adventure theme park in Orlando. It's the first ride in history to combine stereoscopic 3D film projected onto giant screens with the latest in motion-based vehicle technology. This virtual-reality adventure immerses riders in a comic-book battle between Spider-Man and members of the sinister syndicate as riders move through a 1.5-acre set environment.

Working with our head software designer, Frank Vitz, we developed software that would compensate for the viewing position of the moving audience, who would sit in six degrees-of-freedom motion simulators traveling on a track past 13 large reflective screens. The imagery was projected in stereoscopic eight-perf 70mm film.

A great deal of attention was paid to matching the physical sets in the ride to the imagery projected onto the screens so that the lines were blurred between the real world and the virtual, projected world. In fact, many of the sets adjacent to the screens were dressed with CG textures that originated from our virtual sets and were scanned onto eight-foot-wide canvas murals so that imagery and sight lines would match up and blend the two worlds into one.

From a design standpoint, our goal was to take the audience into a comic book world that combines the hard key-lighting and saturated-color style of comic art with enough textural detail to feel like a real place. It was a balancing act between stylization and realism that resulted in a unique and exciting environment in which to stage the epic struggle between Spider-Man and a gaggle of super-villains led by Dr. Octopus, one that swirls around the audience whom Spider-Man must protect.

We tested and abandoned motion capture for the project based on the fact that the superhuman performances of the Marvel characters could be better realized by talented animators using keyframe techniques rather than by animators trying to extend the physical range of motion-captured athletes.

Our first totally original synthespian project was made possible by Busch Entertainment, who gave us virtual "carte blanche" to design a ride from the ground up for a new area at Busch Gardens in Williamsburg, VA. With only one word of direction, "Ireland," from the client, we wrote a story called Corkscrew Hill that would exploit the physical parameters available: two 60-person Reflectone motion bases in two identical warehouse spaces.

The Corkscrew Hill computer-animated stereoscopic epic ride experience takes audiences on an adventure to Old Ireland, populated with humans and mythical creatures. In the pre-show, the audience shrinks to fit in a magic box. Then they enter a motion base and are strapped into their seats for the main show: one continuous-point-of-view shot from the box as characters carry it on a wild adventure on Corkscrew Hill.

SensAble's FreeForm System was used to sculpt character heads. Pieces of character models were joined with Paraform software. Maya was used for modeling, animation, and rendering. Large-format digital projection was engineered by Electrosonic.

We specified very large reflective screens and an open cockpit design for the attraction and—working with the audio-visual engineers at Electrosonic in Burbank, CA—we came up with a digital projection system that would give us film resolution on a large screen despite the fact that digital projectors were currently not up to the task.

By rotating four Barco DLP projectors 90 degrees and edge-blending down the middle (using two projectors for each image), we could get stereoscopic image pairs onto the 30 x 40 foot screens at 2048 horizontal by 1280 vertical resolution. Since the brain fuses the left and right images into a single mental image, any image artifacts from the projection were lost in the mental blending process, resulting in excellent stereoscopic imagery. We choreographed a camera move that takes us on an adventure through ancient Ireland, encountering Irish townspeople, a magic flying horse, banshees, a troll, a witch and a griffin.

This eight-minute attraction allowed us to create a completely synthetic world and populate it with mythical creatures and characters with a visual style akin to that of a storybook. Again, we opted for keyframed character animation instead of motion capture, which often seems pedestrian when applied to CG characters. When keyframing, an animator enters into and becomes the character, breathing original life into it that cannot be obtained through motion capture, which is in effect the three-dimensional "xeroxing" of a physical performance.

In the same way that a caricature of a person looks more like the subject than would a tracing off a photograph, or a good sculpture of a person looks more like them than a life cast, a stylized CG character created by a talented keyframe animator looks more believable and lifelike than one created with motion capture and CyberScans.

The limits of photorealism

Looking to the future, one must examine one's goals in creating CG life forms. There are those who hold up photorealism as the ultimate goal: to create a synthespian indistinguishable from a live actor. This idea has intrigued, taunted and tormented programmers for 30 years, going back to the films Westworld in 1973 and Looker in 1981. The broad base of development required to accomplish this feat has been gaining momentum at an exponential rate as more applications, competition and funding enter the arena.

There exists a trade-off between what level of realism is possible versus how much computing time can be spent on each frame. We supplied very efficient body databases to Ray Kurzweil's Ramona project, which was presented at the Technology Entertainment Design (TED) conference in February 2001. This real-time performance took advantage of recent developments in hardware rendering that allowed a fairly sophisticated human figure to be rendered and displayed at 30 frames per second. Through the use of real-time motion capture and voice synthesis, Ray was able to inhabit his female alter ego, Ramona.

Ray Kurzweil's Ramona project made use of real-time motion capture and voice synthesis technologies.

The performance was designed within the limitations of the technology, in that the "camera" did not venture too close to Ramona's face, where the "efficiency" of our data would become a liability in terms of image quality. As the camera approaches the subject, the resolution requirements skyrocket, and to render a photorealistic close-up on film requires orders of magnitude more calculation than can be supported by real-time rendering engines.

As Ray points out, computing speeds are increasing at exponential rates, but current technology still gets slammed to the mat when it is applied to creating a synthespian who appears real in every detail. The problem is that we spend so much of our time studying the nuances of facial expression in our colleagues, friends and family, so we have become quite expert at spotting flaws. There are many subtle details in a real face, including how the complex muscle system perturbs the skin surface, how light scatters inside the skin, and how surface pores, blemishes and other minute details look and react to light.

A spectacular amount of money was poured into solving these problems in the all-CG feature film Final Fantasy: the Spirits Within and the results did not pay off at the box office. Many projects have been proposed that would use CG characters to bring deceased actors to the screen for a posthumous encore, but the technology is not yet ready for this task, and many of us cringe at the prospect of this sort of application. The recent release of S1m0ne reiterates the basic problem these sorts of projects face: we can get about 90 percent of the way to photorealism in CG actors, but the last ten percent is extremely expensive and time-consuming in comparison to photographing real actors.

Characters from Sony Pictures' CG feature film Final Fantasy

In Final Fantasy, the hair, skin, cloth dynamics and lighting are all in that 95% range that just doesn't make it to photoreality, except in stills. In motion, the illusion lacks the subtlety of micro-motion and micro-detail of live action photography, and the results are unsettling and distracting from the storyline.

In Simone, the producers opted to use a live actor who was digitally altered to be just slightly idealized through image processing. Coupled with a few shots of a CyberScanned 3-D model being revealed like an orange peel wipe-on, the processed footage told the story adequately and carried the story point of a believable CG human cost-effectively.

To have used a CG character throughout would have been many times more costly and it is unlikely that the audience would believe that the CG character could be mistaken for a real actor. Albeit a valiant attempt, the film presumes a mythical world where Hollywood producers and the general public have no knowledge of the history and progress of visual effects, computer animation and digital compositing.

Animator as actor

The exciting areas to be explored are those where the animator becomes analogous to the actor. By animators using a robust set of tools and techniques, performances of the quality and richness currently created by the finest actors will be made possible in a medium free of the constraints of live-action photography. These will be characters, roles, and plots that exploit our intimate familiarity with the human form and its subtleties, but don't attempt to recreate photorealistic renderings of it.

When painters developed the skills to recreate realistic images, a golden era of realism followed. But when photography came along and replaced the role of the painter as visual documentarian, painters responded with expressionism and abstraction, modes of image-making only possible in the era of post-realism.

In the same way, after the CG industry is able to reproduce reality in its most intricate detail, the next step will be to build upon that foundation a new and exciting future of non-realistic style. But rather than being limited to the confines of a painted canvas or a physical sculpture, the realm of imagination becomes the only outer limit.

Beyond the capability to achieve photorealism, there is a much more compelling goal of creating entertainment that takes place beyond what and where we can photograph. The writer and director are now the creative overlords, equipped with unlimited theatrical possibilities in terms of locations, characters, storylines and visual style. The entire world of science fiction and fantasy-based literature can be shot "on location" without limitation. New stories heretofore inconceivable will be created and brought to the world of the visual arts and entertainment.

In this work, the emphasis will be on the concept. The writer and director stand at the door of a new space that has thus far been explored by precious few—and marvel at the possibilities.

Sextone for President Written and Directed by Jeff Kleiser and Diana Walczak © 1988 Kleiser-Walczak

Don't Touch Me Directed by Diana Walczak and Jeff Kleiser © 1989 Kleiser-Walczak

In Search of the Obelisk Computer Animation by Kleiser-Walczak. Produced by The Trumbull Company for Circus Circus Enterprises, Inc.

X-Men © 2000 Twentieth Century Fox. All rights reserved. Image courtesy Kleiser-Walczak

The One © 2001 Revolution Studios Distribution Company, LLC. Property of Sony Pictures Entertainment, Inc. © 2001 Columbia Pictures Industries, Inc. All rights reserved. Image courtesy Kleiser-Walczak.

The Amazing Adventures of Spider-Man © 1999 Universal Studios Escape. A Universal Studios/Rank Group Joint Venture. All rights reserved. Image courtesy Kleiser-Walczak.

Corkscrew Hill Original ride film Written and Directed by Jeff Kleiser and Diana Walczak © 2001 Busch Entertainment Corporation. All rights reserved. Image courtesy Kleiser-Walczak.

Final Fantasy Property of Sony Pictures Entertainment, Inc. © 2001 Columbia Pictures Industries, Inc. All rights reserved.

S1M0NE © 2002 Darren Michaels/New Line Productions


   [Post New Comment]
Mind·X Discussion About This Article:

Mo-Cap Vs Hand Animation?
posted on 10/20/2002 2:25 AM by J. Jarbles

[Reply to this post]

Certainly 3D animation and modeling has come a long way since the days of Luxor Jr. I agree completely that the only thing holding back the first true photorealistic cg thespian is quality of animation.

The characters in Final Fantasy, albeit very real looking in stills, seemed too stiff as they moved around. If you would like to see a cg artist who takes the photorealistic still to the next level please look at the following link:

I was curious what would be a proper way to combat the issue of badly articulating CG models? MO-Cap obviously doesn't get the job done correctly even though it is taken directly from life. What about hand animation? If Pixar were to animate a photorealistic model would they be able to correctly animate the subtle nuances?

Thank You,
J. Jarbles

Re: Mo-Cap Vs Hand Animation?
posted on 09/07/2006 7:46 PM by struthersc

[Reply to this post]

Aye, I agree there as far as the animation is concerned. About the only thing that was really wrong there was, well, the "Hair" enigma lol. All of the models hair didn't move, which really showed me how utterly we take some things for granted. The simple matter of the Hair not moving really threw the rest of the character off. Solve that and you're off to the races. Another thing to consider also before we integrate humans into this environment, is the problem of "simulator sickness"; keep in mind that there is a very real physiological state to deal with.

Re: Synthespianism
posted on 09/12/2006 4:30 PM by mindx back-on-track

[Reply to this post]