back to indexDetecting Driver Frustration from Audio and Video (IJCAI 2016)
00:00:00.000 |
This video accompanies our paper presented at IJCAI, the International Joint Conference 00:00:05.720 |
on Artificial Intelligence, where we propose a system for detecting driver frustration 00:00:12.120 |
from the fusion of two data streams, first the audio of the driver's voice and second 00:00:25.000 |
These are video snapshots of two drivers using the in-car voice-based navigation system. 00:00:31.840 |
Which one of them looks more frustrated with the interaction? 00:00:36.760 |
To help answer that question, let's take a look at an example interaction involving 00:00:43.900 |
Our proposed approach uses the audio of the driver's voice when the "human" is speaking 00:00:50.120 |
and the video of the driver's face when he is listening to the machine speak. 00:00:57.640 |
What you are seeing and hearing is the driver attempting to instruct the car's voice-based 00:01:01.760 |
navigation system to navigate to 177 Massachusetts Ave, Cambridge, Massachusetts. 00:01:09.800 |
177 Massachusetts Ave, Cambridge, Massachusetts. 00:01:31.840 |
177 Massachusetts Ave, Cambridge, Massachusetts. 00:02:22.680 |
On a scale of 1 to 10, with 1 being completely satisfied and 10 being completely frustrated, 00:02:28.680 |
the smiling driver reported his frustration level with this interaction to be a 9. 00:02:35.540 |
We use self-reported level of frustration as the ground truth for the binary classification 00:02:45.320 |
When the driver is speaking, we extract the Geneva Minimalistic Acoustic Parameter Set 00:02:50.600 |
(GMAPS) features from their voice which measures basic physiological changes in voice production. 00:02:58.520 |
When the driver is listening, we extract 14 facial actions using the AFDEX system from 00:03:06.840 |
The classified decisions are fused together to produce an accuracy of 88.5% on an on-road 00:03:18.080 |
There are two takeaways from this work that may go beyond just detecting driver frustration. 00:03:23.320 |
First, self-reported emotion state may be very different than one assigned by a group 00:03:29.480 |
of external annotators, so we have to be careful when using such annotations as the ground 00:03:35.160 |
truth for other effective computing experiments. 00:03:39.680 |
Second, detection of emotion may require considering not just facial actions or voice acoustics, 00:03:46.720 |
but also context of the interaction, and the target of the effective communication. 00:03:54.040 |
For more information or to contact the authors, please visit the following website.