The goal of realistic human facial representation
has remained elusive for several reasons. First, the mechanisms that underline the face appearance and motion are extremely complicated.
The appearance of the face is determined by how light bounces between multiple layers of skin, resulting in subsurface scattering.
Furthermore, the face is deformed by the combined actions of ten different muscle groups.
Moreover, a lot of the difficulties in creating realistic digital faces come from our well-honed ability to observe and interpret
the faces and expressions of people around us. This ability makes us very proficient at noticing the slightest deviation
from reality when observing digital images of the human face.
Traditional attempts at creating realistic faces have either involved a great deal of effort by talented artists or
detailed mathematical simulations. The artistic approach is inherently limited by the amount of effort it takes to recreate,
by hand, the realistic three-dimensional appearance of the human face. Although generations of talented artists have created
very believable portraits and sculptures of human faces, creating a realistic facial model that reflects light and moves in a
believable way remains a challenging task. The mathematical approach has its limitations too. Although, there are realistic
mathematical models of the surface of the skin and of the face muscles, these simulations are unable to render the idiosynchrasies
that are part of the identity of a person. The mathematical simulations provide generic motions and surfaces, but do not provide
mechanisms by which to model variations across individuals.
These approaches have yet to produce results that could trick us into thinking we are looking at a real person's face and
not a computer-generated image.
We are working on a new approach to put this "holy grail" of computer animation within reach: recording the appearance
and motion of a real person to create a digital replica. While a photograph or a video capture of the appearance of a subject
may be realistic, it does not give the freedom to change the view point or the light falling on the person's face. This would
be expected for a digital actor existing in a synthetic environment. We have begun to address this issue by recovering the
three-dimensional surface of a person's face from a set of images , its motion during a performance, and its appearance under
different lights.
Our goal is to develop technologies that would permit the capture of a performer and to digitally reanimate him/her in
an arbitrary scenario.
Following this data-driven philosophy we have built two systems: one that animates the face according to a recorded
speech input and another that animates gaze according to an input video.
Visual Speech
This system takes as input a recorded
enunciation and translate it into a facial animation. The result is a
believable facial animation that corresponds to the recorded
audio. Our main contribution is to not only lip-synch the animation to
the speech but also to render appropriately the expressive contents of
the speech signal. To achieve this goal we process the input speech to
extract both phonemic (lip-synching) and prosodic (expressivenes)
features.
Here are a few examples of synthesized animations:
Example1 (4.8M), Example2 (3.6M), Example3 (2.7M)
Example4 (1M), Example5 (1.5M), Example6 (1.7M),
We also present a new method for editing speech related facial motions. Our
method uses an unsupervised learning technique, Independent Component
Analysis (ICA), to extract a set of meaningful parameters without any
annotation of the data. With ICA, we are able to solve a blind source
separation problem and describe the original data as a linear combination
of two sources. One source captures content (speech) and the other captures
style (emotion). By manipulating the independent components we can edit the
motions in intuitive ways.
We have deployed this animation system within a leadership training
tool to embody the face of a synthetic mentor.
This work is done in collaboration with Yong Cao and Petros Faloutsos.
Here are a few examples of synthesized animations:
Gaze Animation

The purpose of this system is to generate gaze animation for a
synthetic head. We used a neurobiological model developed by Laurent
Itti to model human visual attention. We applied this model to a video
representing the scene viewed by the synthetic character and extracted
a sequence of feature points that represents the focus of attention of
the character. Using this information along with a model of eye and
head motion we can animate the gaze of a synthetic character.
This work is done in collaboration with Laurent Itti and Nitin Dhavale.
These videos illustrates the estimation of the focus of attention:
Here are a few examples of synthesized gaze animation:
These animations are best viewed with the DIVX codec installed.
Blend Shape Animation
We adopted previous works in blendshape animation and proposed an automatic,
physically-motivated segmentation that learns the controls and parameters directly
from the detailed input expressions. In addition, we provided rendering techniques
which improve the visual realism of our blend shape model. Our system can be used
in both motion-capture and keyframe animations.
This work is done in collaboration with Pushkar Joshi, Mathieu Desbrun, and Wen Tien.
Here is an example of using BlendShape models for motion-capture animation:
Here are some example of using BlendShape models for keyframe animation at various region granularities:
These animations are best viewed with the DIVX codec installed.
|