Workshop on Transactional Emotions
Friday, October 26 (Unless noted, activities in ICT 6th Floor Conference Room)
· 8:30- 9:00 Breakfast/Registration (1st Floor)
· 9:00- 9:30
· 10:20-10:50 Break (30min)
· 12:30- 2:00 Lunch (at ICT, provided, 1st Floor))
· 2:00- 2:50 Antonio Damasio – Affective Neuroscience
· 3:40- 4:10 General Discussion
· 4:10- 4:30 Break (20min)
· 4:30- 4:50 Virtual Human Demonstration (VR Theater)
· 4:50- 6:10 Software Demonstrations: (Room 222)
· Automatic FACS coding (Movellan)
· Dyadic Motion capture data (Narayanan)
· Gesture Tracking (Morency)
· Rapport Agent (Gratch & Wang)
· Virtual Patient (
· SmartBody (Marsella)
· Continuous Measurement System (Messinger)
· 7:00- 9:00 Dinner (Abode Restaurant and Lounge)
Saturday, October 27
· 8:30- 9:00 Breakfast at ICT (1st Floor)
· 9:50-10:40 David Pynadath – Computer models of Theory of Mind (suggested reading)
· 10:40-11:10 Break (30min)
· 12:00- 1:30 Lunch (at ICT, provided, 1st floor)
· 1:30- 2:20
· 3:10- 3:40 Break (30min)
· 3:40- 4:30 Justine Cassell - Verbal and nonverbal implementations of rapport
· 6:30- 8:30 Dinner at Getty Center
Sunday, October 28
· 8:30- 9:00 Breakfast at ICT (1st Floor)
· 9:00- 9:50
· 9:50-10:40 Javier Movellan – Developing practical expression recognition systems
· 10:40-11:10 Break (30min)
· 11:10-12:10 General Discussion
· 12:10- 1:40 Lunch (at ICT, provided, 1st Floor)
Neurophysiology of emotion, VR/Social psych
Emotion and negotiation
This presentation focuses on a framework for the study of emotion in negotiation and related social contexts. It will review results from a program of research that investigates variables such as mood on information processing and expectations in negotiation, and the impact of emotion on what is valued in negotiation.
The phenomenon of rapport has many parts - the feeling of instant connection with another person, the growing familiarity between two people, and the sense of knowing somebody well that comes after a long friendship. As Cappella has pointed out, however, many studies of rapport do not differentiate among these different phenomena. In this talk I report results from a study that looked at the effects of familiarity and friendship (the latter two of those phenomena) on verbal and nonverbal behavior, and the subsequent computational model that is based on those results. Data come from a study of friends and strangers, who either could or could not see one another, and who were asked to give directions to one-another three subsequent times. Analysis focused on differences in the use of dialogue acts and non-verbal behaviors, as well as co-occurrences of dialogue acts, eye gaze and head nods. Our results demonstrate a pattern of verbal and nonverbal behavior that differentiates the dialogue of friends from that of strangers, and differentiates early acquaintances from those who have worked together before, in such a way that we can begin to see how a computational system might interact differently with a user over time.
The relationship of gesture and speech prosody to affect/arousal in interactive discourse
Language production makes use of speakers’ ability to shape, direct, and locate their hands and bodies in space and in relation to interlocutors and to objects in the immediate environment. This ability contributes to a cognitive and social-interactional function: the elaboration, via coverbal gesturing, of “material carriers” (McNeill & Duncan 2000) of linguistic conceptualizations. Here we examine the tight integration of coverbal gesture and speech prosodic emphasis in natural interactive discourse. We consider the effect, on communication, of factors that expand versus constrict these dimensions of expression. Data from videotaped narrative discourses of healthy individuals and of individuals with Parkinson’s Disease (a neurodegenerative disorder affecting sensorimotor function, affect, and cognition) suggest the following: (i) speech prosody and coverbal gesture jointly highlight discourse focal information, (ii) prosody and gesture assist communicating partners in multiple ways to coordinate their models of an ongoing discourse, (iii) variations in affect/emotion or, in general, of degree of arousal, during discourse, have an impact that spans the vocal and visuo-spatial dimensions of communication, suggesting that these are a unified dimension of communicative behavior. These observations are discussed as evidence in support of theories that hold language use to be an embodied process, fundamental characteristics of which vary in relation to the social relationship of interlocutors and their affective-emotional states.
Computational appraisal theory, theory of mind
Development of emotional communication
Early emotional development is a nonverbal interactive process. Different patterns of infant smiling and gazing emerge within dyadic interactions in the first 10 months of life. They are characterized by an increasing ability to engage in highly positive emotional engagement with a partner, the ability to disengage from the partner, and the ability to intentionally signal to a partner. These abilities may be contingent on the capacity of the infant and partner to mutually respond to one another. They emergence of these capacities provides a developmental perspective on the capacity of human beings and computational systems to engage in such early transactions.
We have used various techniques to understand the necessary constituents of these developments in dyadic interaction. A statistical simulation technique highlights the patterning of discrete acts in time. Computer vision techniques provide models of the movement of the facial features of interacting dyads. Continuous Measurement Software uses a joystick-interface to record the real-time reactions of 'naive' observers to ongoing behavior. These techniques may be of general utility in modeling 'transactional affective phenomena.'
Understanding gestures in context
During face-to-face conversation, people use visual feedback (e.g., head and eye gesture) to communicate relevant information and to synchronize rhythm between participants. When recognizing visual feedback, people often rely on more than their visual perception. For instance, knowledge about the current topic and from previous utterances help guide the recognition of nonverbal cues. In this talk we will describe how contextual information can be used to predict visual feedback and improve recognition of head gestures in human-computer interfaces. Lexical, prosodic, timing, and gesture features can be used to predict a user’s visual feedback during conversational dialog with a robotic or virtual agent. Using a discriminative approach to contextual prediction and multi-modal integration, performance of head gesture detection was improved with context features even when the topic of the test set was significantly different than the training set.
Javier Movellan, UCSD
Developing Practical Expression Recognition Systems
Facial expression is one of the most powerful and immediate means for humans to communicate their emotions, cognitive states, intentions, and opinions to each other. Given the importance of facial expressions, it is not unreasonable to expect that the development of machines that can recognize such expressions may have a revolutionary effect in everyday life. Potential applications include tutoring systems that are sensitive to the expression of their students, computer assisted detection of deceit, diagnosis and monitoring of clinical disorders, evaluation of behavioral and pharmacological treatments, new interfaces for entertainment systems, smart digital cameras, and social robots.
However there is currently a gap in automatic expression recognition between the levels of performance reported in the literature and the actual performance in real life conditions. A troublesome aspect of this gap is that the algorithms that perform well on the standard datasets and in laboratory demonstrations could be leading research in the wrong direction. I will present our experience developing a smile detector for real world applications. The detector became the basis for commercial systems already appearing on some digital cameras. We explore the required characteristics of the training dataset, image representation, and machine learning algorithms. I will also describe how we diagnosed the performance of the system using techniques from the psychophysics literature. Results suggest that human-level smile detection accuracy in real-life applications is achievable with current technology and is ready for practical applications. Generalization to comprehensive expression recognition systems is underway and, likely achievable within the next few years.
Machine Recognition and Synthesis of Emotional Speech
The human speech signal is unique in the sense that it carries crucial information about not only communication intent and speaker identity but also underlying expressions and emotions. It results from a complex orchestration of cognitive, physiological, physical and social processes. Automatically processing and decoding speech and spoken language hence is a vastly challenging, and inherently, interdisciplinary endeavor. This chapter will focus on some of the challenges, and advances, in creating algorithms for machine processing of emotional human speech communication.
Challenges to emotion recognition include the selection of appropriate representation, discerning the corresponding signal features, designing the appropriate pattern classification models and algorithms, and evaluating the effectiveness of all the above. One special challenge is the correspondence between human and machine emotion recognition.
Another challenge comes from the fact that the cues carrying linguistic and affective content co-occur, and reside at multiple time scales and levels of linguistic abstraction. We will describe cues that can be extracted at the phonemic, prosodic, lexical and discourse level including measures to relate lexical, non-lexical and discourse- information to emotional state of the speaker. Additionally, these can be combined with gestural communication information such as facial expressions, hand gestures, head and body postures. The chapter will use examples from recent and ongoing research at USC to highlight some of the methods and outcomes of recognizing and synthesizing expressive speech.
Processes of emotional meaning: An overview
At what stage in the emotion process do people apprehend the relational meaning of the current encounter with the practical or social environment? For many appraisal theorists, meaning (usually or always) comes first, shaping the activation of functional response modes by top-down influence. For dynamic systems theorists, meaning emerges bottom-up in parallel with the real-time consolidation of the response syndrome. For self-attribution theorists, meaning is applied to emotional episodes after the fact, rather than being an intrinsic part of any generative mechanisms. This talk attempts to integrate the insights offered by these apparently contradictory views and to sketch out a view of emotions as functional modes of engagement whose operation is transformed by the imposition of societal prescriptions and descriptions. In this view, relational meaning is often implicated in the causes, content, and consequences of emotion but its roles in these phases of the transaction do not always coincide.
David Pynadath, USC
Theory of mind, intention recognition
Agent-based modeling of human social behavior is an increasingly important research area. A key factor in human social interaction is our beliefs about others, a theory of mind. Whether we believe a message depends not only on its content but also on our model of the communicator. How we act depends not only on the immediate effect but also on how we believe others will react. In this talk, we discuss PsychSim, an implemented multiagent-based simulation tool for modeling interactions and influence. While typical approaches to such modeling have used first-order logic, Psych-Sim agents have their own decision-theoretic model of the world, including beliefs about its environment and recursive models of other agents. Using these quantitative models of uncertainty and preferences, we have translated existing psychological theories into a decision-theoretic semantics that allow the agents to reason about degrees of believability in a novel way. We discuss PsychSim’s underlying architecture and describe its application to emotional appraisal and communication.
On the Sociality of Emotion-Eliciting Appraisals: Two aspects
In this two-part talk, I consider two aspects of the sociality of emotion-eliciing appraisals. In the first part of the talk, I will consider the degree to which persons' appraisals of their circumstances are encoded in the facial muscle actions they produce as part of their emotional expressions. I will review evidence suggesting that appraisals of motivational incongruence (or the perception of goal-obstacles) are encoded in the actions of the corrugator supercilii muscle (to produce the eyebrow frown), and will review hypotheses that have been advanced to link other facial actions to other facets of how persons are appraising their circumstances. The second part of the talk will be more agenda-setting. I will start with the observation, that as currently cast, Appraisal Theory is surprisingly asocial. I will then consider a number of ways in which appraisal theory could, and should, be further developed to allow it to better account for emotions in interpersonal settings.
The Social Emotional Fluency of Dominance Complementarity
In this talk, I’ll cover an array of work on Dominance Complementarity. Dominance complementarity refers to instances in which within a dyad one individual is dominant and the other is submissive. I’ll provide evidence that this social state is emotionally fluent in the sense that people experience positive affect and comfort in response to it, that they seek out this state, and that they are able to process information about these kinds of relationships more easily than other information. I’ll discuss implications for how we think about synchronicity in social relations and the role of emotions in emergent social hierarchies.
Effective face-to-face conversations are highly interactive. Participants respond to each other, engaging in nonconscious behavioral mimicry and backchanneling feedback. Such behaviors produce a subjective sense of rapport and are correlated with effective communication, greater liking and trust, and greater influence between participants. Creating rapport requires a tight sense-act loop that has been traditionally lacking in embodied conversational agents. The Rapport Agent, is designed to create a sense of rapport between a human speaker and virtual human listener by using machine vision and speech processing to rapidly provide positive noverbal feedback. Empirical studies have demonstrated such feedback increases speaker fluency and engagement.
Continuous Measurement System
The Continuous Measurement System (CMS): a joystick-operable software for obtaining continuous reactions to videotaped behavior (http://www.psy.miami.edu/faculty/dmessinger/dv/index.html). The CMS is a digital ‘affect dial’ that offers a ‘readout’ of non-expert’s continuous reactions to stimuli. One use of the CMS is obtaining efficient, transparent, replicable measurement of social behavior or other stimuli from multiple non-experts whose ratings are typically aggregated to increase their precision and generalizability.
SmartBody, part of the VHuman project, is an advanced virtual human behavior generation and character animation system for conversational simulations and training systems. Given a list of communicative functions (illustration, emphasis, turn-taking, etc.), and/or behavioral requests (gaze, gesture, speech, etc.), SmartBody is capable of generating a cohesive animated performance. SmartBody is the leading implementation of SAIBA's BML and FML proposed languages for interfacing with virtual humans. humans offer an exciting and powerful potential for rich interactive experiences. The Virtual Patient is an application.
WATSON: Real-time Head Tracking and Gesture Recognition
Watson can track rigid objects in real-time with 6 degrees of freedom using a tracking framework called Adaptive View-Based Appearance Model. The tracking library can estimate the pose of the object for a long period of time with bounded drift. Our main application is head pose estimation and gesture recognition using a USB camera or a stereo camera. Our approach combines an Adaptive View-based Appearance Model (AVAM) with a robust 3D view registration algorithm. AVAM is a compact and flexible representation of the object that can be used during the tracking to reduce the drift in the pose estimates. The model is acquired online during the tracking and can be adjusted according to the new pose estimates. Relative poses between frames are computed using a hybrid registration technique which combine the robustness of ICP (Iterative Closest Point) for large movement and the precision of the normal flow constraint. The complete system runs at 25Hz on a Pentium 4 3.2GHz.
Virtual humans offer an exciting and powerful potential for rich interactive experiences. The Virtual Patient is an application of virtual human technology to help develop the interviewing and diagnostics skills of developing clinicians. The system allows novice mental health clinicians to conduct an interview with a virtual character that emulates an adolescent male with conduct.