
|
2004 |
|
Workshop on Transactional Emotions
October
26-28, 2007 Agenda
Friday,
October 26
(Unless noted, activities in ICT 6th Floor Conference Room) · 8:30- 9:00 Breakfast/Registration (1st Floor) · · 9:00- 9:30 · 9:30-10:20 · 10:20-10:50
Break (30min) · 10:50-11:40
Craig Smith - On
the Sociality of Emotion-Eliciting Appraisals (reading1) · 11:40-12:30 · 12:30- 2:00 Lunch (at
ICT, provided, 1st Floor)) · 2:00- 2:50 Antonio Damasio – Affective Neuroscience · 2:50- 3:40 Jim Blascovich - Affect, Motivation and Virtual Reality (presentation) · 3:40- 4:10 General Discussion · 4:10- 4:30 Break (20min) · 4:30- 4:50 Virtual Human Demonstration (VR Theater) · · 4:50- 6:10 Software
Demonstrations: (Room 222) · Automatic FACS coding (Movellan) · Dyadic Motion capture data (Narayanan) · Gesture Tracking (Morency) · Rapport Agent (Gratch & Wang) · Virtual Patient ( · SmartBody (Marsella) · Continuous Measurement System (Messinger) · 7:00- 9:00 Dinner (Abode Restaurant and Lounge) · Saturday, October 27 · 8:30- 9:00 Breakfast at ICT (1st Floor) · 9:00- 9:50 Peter Carnevale - Emotion in Negotiation (suggested
reading) (presentation) · 9:50-10:40 David Pynadath – Computer models of Theory of Mind (suggested
reading) · 10:40-11:10 Break
(30min) · 11:10-12:00 Dan Messinger - Development of
emotional communication (suggested
reading) · 12:00- 1:30 Lunch
(at ICT, provided, 1st floor) · 1:30- 2:20 · 2:20- 3:10 · 3:10- 3:40 Break (30min) · 3:40- 4:30 Justine Cassell - Verbal and nonverbal implementations of rapport
· (suggested
reading) (presentation) · 5:00-
Transfer to · 6:30- 8:30 Dinner at Getty Center · Sunday, October 28 · 8:30- 9:00 Breakfast at ICT (1st Floor) · 9:00- 9:50 ·
(suggested
reading1) (suggested
reading1) · 9:50-10:40 Javier Movellan – Developing practical expression recognition
systems · 10:40-11:10 Break
(30min) · 11:10-12:10 General Discussion · 12:10- 1:40 Lunch
(at ICT, provided, 1st Floor) · Neurophysiology of emotion, VR/Social psych Emotion and negotiation This presentation focuses
on a framework for the study of emotion in negotiation and related social
contexts. It will review results from a program of research that investigates
variables such as mood on information processing and expectations in
negotiation, and the impact of emotion on what is valued in negotiation. Justine
Cassell, Northwestern University The phenomenon of rapport
has many parts - the feeling of instant connection with another person, the
growing familiarity between two people, and the sense of knowing somebody
well that comes after a long friendship. As Cappella has pointed out,
however, many studies of rapport do not differentiate among these different
phenomena. In this talk I report results from a study that looked at
the effects of familiarity and friendship (the latter two of those phenomena)
on verbal and nonverbal behavior, and the subsequent computational model that
is based on those results. Data come from a study of friends and strangers,
who either could or could not see one another, and who were asked to give
directions to one-another three subsequent times. Analysis focused on
differences in the use of dialogue acts and non-verbal behaviors, as well as
co-occurrences of dialogue acts, eye gaze and head nods. Our results
demonstrate a pattern of verbal and nonverbal behavior that differentiates
the dialogue of friends from that of strangers, and differentiates early
acquaintances from those who have worked together before, in such a way that
we can begin to see how a computational system might interact differently
with a user over time. Affective neuroscience The relationship of gesture and speech prosody to affect/arousal
in interactive discourse Language production makes
use of speakers’ ability to shape, direct, and locate their hands and
bodies in space and in relation to interlocutors and to objects in the
immediate environment. This ability contributes to a cognitive and
social-interactional function: the elaboration, via coverbal gesturing, of
“material carriers” (McNeill & Duncan 2000) of linguistic
conceptualizations. Here we examine the tight integration of coverbal gesture
and speech prosodic emphasis in natural interactive discourse. We consider
the effect, on communication, of factors that expand versus constrict these dimensions of expression. Data from
videotaped narrative discourses of healthy individuals and of individuals
with Parkinson’s Disease (a neurodegenerative disorder affecting
sensorimotor function, affect, and cognition) suggest the following: (i)
speech prosody and coverbal gesture jointly highlight discourse focal
information, (ii) prosody and gesture assist communicating partners in
multiple ways to coordinate their models of an ongoing discourse, (iii)
variations in affect/emotion or, in general, of degree of arousal, during
discourse, have an impact that spans the vocal and visuo-spatial dimensions
of communication, suggesting that these are a unified dimension of
communicative behavior. These observations are discussed as evidence in
support of theories that hold language use to be an embodied process,
fundamental characteristics of which vary in relation to the social
relationship of interlocutors and their affective-emotional states. Computational appraisal theory, theory of mind Daniel Messinger, Development of emotional communication Early emotional development is a nonverbal
interactive process. Different patterns of infant smiling and gazing emerge within
dyadic interactions in the first 10 months of life. They are characterized by
an increasing ability to engage in highly positive emotional engagement with
a partner, the ability to disengage from the partner, and the ability to
intentionally signal to a partner. These abilities may be contingent on the
capacity of the infant and partner to mutually respond to one another. They
emergence of these capacities provides a developmental perspective on the
capacity of human beings and computational systems to engage in such early
transactions. We have used various techniques to understand the
necessary constituents of these developments in dyadic interaction. A
statistical simulation technique highlights the patterning of discrete acts
in time. Computer vision techniques provide models of the movement of the
facial features of interacting dyads. Continuous Measurement Software uses a
joystick-interface to record the real-time reactions of 'naive' observers to
ongoing behavior. These techniques may be of general utility in modeling
'transactional affective phenomena.' Understanding gestures in context During face-to-face
conversation, people use visual feedback (e.g., head and eye gesture) to
communicate relevant information and to synchronize rhythm between
participants. When recognizing visual feedback, people often rely on more
than their visual perception. For instance, knowledge about the current topic
and from previous utterances help guide the recognition of nonverbal cues. In
this talk we will describe how contextual information can be used to predict
visual feedback and improve recognition of head gestures in human-computer
interfaces. Lexical, prosodic, timing, and gesture features can be used to
predict a user’s visual feedback during conversational dialog with a
robotic or virtual agent. Using a discriminative approach to contextual
prediction and multi-modal integration, performance of head gesture detection
was improved with context features even when the topic of the test set was
significantly different than the training set. Javier Movellan, UCSD Developing Practical Expression Recognition Systems Facial expression is one of
the most powerful and immediate means for humans to communicate their
emotions, cognitive states, intentions, and opinions to each other. Given the
importance of facial expressions, it is not unreasonable to expect that the
development of machines that can recognize such expressions may have a
revolutionary effect in everyday life. Potential applications include
tutoring systems that are sensitive to the expression of their students,
computer assisted detection of deceit, diagnosis and monitoring of clinical
disorders, evaluation of behavioral and pharmacological treatments, new
interfaces for entertainment systems, smart digital cameras, and social
robots. However there is currently
a gap in automatic expression recognition between the levels of performance
reported in the literature and the actual performance in real life
conditions. A troublesome aspect of this gap is that the algorithms that
perform well on the standard datasets and in laboratory demonstrations could
be leading research in the wrong direction. I will present our experience
developing a smile detector for real world applications. The detector became
the basis for commercial systems already appearing on some digital cameras.
We explore the required characteristics of the training dataset, image
representation, and machine learning algorithms. I will also describe how we diagnosed the
performance of the system using techniques from the psychophysics
literature. Results suggest that
human-level smile detection accuracy in real-life applications is achievable
with current technology and is ready for practical applications. Generalization
to comprehensive expression recognition systems is underway and, likely
achievable within the next few years. Machine Recognition and
Synthesis of Emotional Speech The human speech signal is unique in the sense that
it carries crucial information about not only communication intent and
speaker identity but also underlying expressions and emotions. It results
from a complex orchestration of cognitive, physiological, physical and social
processes. Automatically processing and decoding speech and spoken language
hence is a vastly challenging, and inherently, interdisciplinary endeavor.
This chapter will focus on some of the challenges, and advances, in creating
algorithms for machine processing of emotional human speech communication. Challenges to emotion recognition include the
selection of appropriate representation, discerning the corresponding signal
features, designing the appropriate pattern classification models and algorithms,
and evaluating the effectiveness of all the above. One special challenge is
the correspondence between human and machine emotion recognition. Another challenge comes from the fact that the cues
carrying linguistic and affective content co-occur, and reside at multiple
time scales and levels of linguistic abstraction. We will describe cues that
can be extracted at the phonemic, prosodic, lexical and discourse level
including measures to relate lexical, non-lexical and discourse- information
to emotional state of the speaker. Additionally, these can be combined with
gestural communication information such as facial expressions, hand gestures,
head and body postures. The chapter will use examples from recent and ongoing
research at USC to highlight some of the methods and outcomes of recognizing
and synthesizing expressive speech. Processes of emotional meaning: An overview At what stage in the
emotion process do people apprehend the relational meaning of the current
encounter with the practical or social environment? For many appraisal theorists, meaning
(usually or always) comes first, shaping the activation of functional
response modes by top-down influence.
For dynamic systems theorists, meaning emerges bottom-up in parallel
with the real-time consolidation of the response syndrome. For self-attribution theorists, meaning is
applied to emotional episodes after the fact, rather than being an intrinsic
part of any generative mechanisms.
This talk attempts to integrate the insights offered by these
apparently contradictory views and to sketch out a view of emotions as
functional modes of engagement whose operation is transformed by the
imposition of societal prescriptions and descriptions. In this view, relational meaning is often
implicated in the causes, content, and consequences of emotion but its roles
in these phases of the transaction do not always coincide. David Pynadath, USC Theory of mind, intention recognition Agent-based modeling of
human social behavior is an increasingly important research area. A key
factor in human social interaction is our beliefs about others, a theory of
mind. Whether we believe a message depends not only on its content but also
on our model of the communicator. How we act depends not only on the
immediate effect but also on how we believe others will react. In this talk,
we discuss PsychSim, an implemented multiagent-based simulation tool for
modeling interactions and influence. While typical approaches to such
modeling have used first-order logic, Psych-Sim agents have their own
decision-theoretic model of the world, including beliefs about its
environment and recursive models of other agents. Using these quantitative
models of uncertainty and preferences, we have translated existing
psychological theories into a decision-theoretic semantics that allow the
agents to reason about degrees of believability in a novel way. We discuss
PsychSim’s underlying architecture and describe its application to
emotional appraisal and communication. Craig
Smith, On the Sociality of Emotion-Eliciting Appraisals: Two aspects In this two-part talk, I
consider two aspects of the sociality of emotion-eliciing appraisals. In the first part of the talk, I will
consider the degree to which persons' appraisals of their circumstances are
encoded in the facial muscle actions they produce as part of their emotional
expressions. I will review evidence
suggesting that appraisals of motivational incongruence (or the perception of
goal-obstacles) are encoded in the actions of the corrugator supercilii
muscle (to produce the eyebrow frown), and will review hypotheses that have
been advanced to link other facial actions to other facets of how persons are
appraising their circumstances. The
second part of the talk will be more agenda-setting. I will start with the observation, that as
currently cast, Appraisal Theory is surprisingly asocial. I will then consider a number of ways in
which appraisal theory could, and should, be further developed to allow it to
better account for emotions in interpersonal settings. Larissa
Tiedens, The Social Emotional Fluency of Dominance Complementarity In this
talk, I’ll cover an array of work on Dominance Complementarity.
Dominance complementarity refers to instances in which within a dyad one
individual is dominant and the other is submissive. I’ll provide
evidence that this social state is emotionally fluent in the sense that
people experience positive affect and comfort in response to it, that they
seek out this state, and that they are able to process information about
these kinds of relationships more easily than other information.
I’ll discuss implications for how we think about synchronicity in
social relations and the role of emotions in emergent social hierarchies. Demonstrations
Rapport Agent Effective face-to-face
conversations are highly interactive. Participants respond to each other,
engaging in nonconscious behavioral mimicry and backchanneling feedback. Such
behaviors produce a subjective sense of rapport and are correlated with
effective communication, greater liking and trust, and greater influence
between participants. Creating rapport requires a tight sense-act loop that
has been traditionally lacking in embodied conversational agents. The Rapport
Agent, is designed to create a sense of rapport between a human speaker and
virtual human listener by using machine vision and speech processing to
rapidly provide positive noverbal feedback. Empirical studies have
demonstrated such feedback increases speaker fluency and engagement.
Continuous Measurement System The Continuous Measurement System (CMS): a joystick-operable
software for obtaining continuous reactions to videotaped behavior (http://www.psy.miami.edu/faculty/dmessinger/dv/index.html).
The CMS is a digital ‘affect dial’ that offers a
‘readout’ of non-expert’s continuous reactions to stimuli.
One use of the CMS is obtaining efficient, transparent, replicable
measurement of social behavior or other stimuli from multiple non-experts
whose ratings are typically aggregated to increase their precision and
generalizability.
SmartBody, part of the VHuman project, is an
advanced virtual human behavior generation and character animation system for
conversational simulations and training systems. Given a list of
communicative functions (illustration, emphasis, turn-taking, etc.), and/or
behavioral requests (gaze, gesture, speech, etc.), SmartBody is capable of
generating a cohesive animated performance. SmartBody is the leading
implementation of SAIBA's BML and FML proposed languages for interfacing with
virtual humans. humans offer an exciting and powerful potential for rich
interactive experiences. The Virtual Patient is an application.
WATSON: Real-time Head Tracking and Gesture
Recognition Watson can track rigid
objects in real-time with 6 degrees of freedom using a tracking framework called
Adaptive View-Based Appearance Model.
The tracking library can estimate the pose of the object for a long
period of time with bounded drift. Our
main application is head pose estimation and gesture recognition using a USB
camera or a stereo camera. Our approach combines an Adaptive View-based
Appearance Model (AVAM) with a robust 3D view registration algorithm. AVAM is
a compact and flexible representation of the object that can be used during
the tracking to reduce the drift in the pose estimates. The model is acquired
online during the tracking and can be adjusted according to the new pose
estimates. Relative poses between
frames are computed using a hybrid registration technique which combine the
robustness of ICP (Iterative Closest Point) for large movement and the
precision of the normal flow constraint.
The complete system runs at 25Hz on a Pentium 4 3.2GHz.
Virtual
Patient Virtual humans offer an exciting and powerful
potential for rich interactive experiences. The Virtual Patient is an
application of virtual human technology to help develop the interviewing and diagnostics
skills of developing clinicians. The system allows novice mental health clinicians to conduct an interview with a
virtual character that emulates an adolescent male with conduct. |