This talk is about tech­nol­o­gy that goes under var­i­ous names: emo­tion­al robots, affec­tive com­put­ing, human-centered com­put­ing. But behind all these is actu­al­ly the tech­nol­o­gy for auto­mat­ic under­stand­ing of human behav­ior, and more specif­i­cal­ly human facial behav­ior.

The human face is sim­ply fas­ci­nat­ing. It serves as our pri­ma­ry means to iden­ti­fy oth­er mem­bers of our species. It also serves to judge oth­er people’s age, gen­der, beau­ty, or even per­son­al­i­ty. But more impor­tant is that the face is a con­stant flow of facial expres­sions. We react and emote to exter­nal stim­uli all the time. And it is exact­ly this flow of expres­sions that is the observ­able win­dow to our inner self. Our emo­tions, our inten­tions, atti­tudes, moods.

Why is this impor­tant? Because we can use it in a very wide vari­ety of appli­ca­tions. So, every­body wants to know who the per­son is and what is the mean­ing of his or her expres­sion, and use it for var­i­ous appli­ca­tions. When it comes to analy­sis of our faces in sta­t­ic face images, iden­ti­fi­ca­tion of faces, this prob­lem is actu­al­ly con­sid­ered solved. Similarly, we can say for facial expres­sion analy­sis, in frontal view videos that the prob­lem is more or less solved.

Four video clips playing simultaneously with markers indicating facial detection and captions on some of the panes with evaluation of the person's expression

Clip play­ing on-screen at ~1:371:49

As you can see from these videos, we can accu­rate­ly track faces in frontal views, and even judge expres­sions such as frowns, or smiles, high-level behav­iors like inten­si­ty of joy, or inten­si­ty of inter­est, even in out­door envi­ron­ments. However, when it comes to com­plete­ly uncon­strained envi­ron­ments where we have large changes in head pose, and when he have occur­rences of large occlu­sions, then we are fac­ing a chal­lenge. We call this prob­lem auto­mat­ic face and facial expres­sion analy­sis in videos uploaded [to] social media like YouTube and Facebook.

We have to col­lect a lot of data in the wild, anno­tate this in terms of where the face is and where the parts of the face are. And then build these multi-view mod­els that will be able to actu­al­ly han­dle these large changes in head pose. We also need to take the con­text in which a facial expres­sion is expressed into account in order to be able to deal with the sub­tle facial behav­ior, or with occlu­sions of facial expres­sions. So, context-sensitive machine learn­ing mod­els are the future.

Screenshot of an application showing a young woman's face, with various detected facial features traced by dots

Clip play­ing on-screen at ~2:482:56

Nonetheless, the tech­nol­o­gy as it is now is still very much applic­a­ble to a wide vari­ety of appli­ca­tions. A good exam­ple is mar­ket analy­sis, where we could use the reac­tion of peo­ple to prod­ucts in adverts in order to judge the suc­cess­ful­ness of these prod­ucts in adverts. The soft­ware is com­mer­cial­ly avail­able [from] Realeyes, and we are work­ing with this com­pa­ny in order to include ver­bal feed­back about prod­ucts and adverts as well, and to build anoth­er tool for skill enhance­ments such as con­flict res­o­lu­tion skills.

Screenshot of an application showing a woman's face with detected facial features traced by cross-shaped markers, and a readout below evaluating pain intensity

Clip play­ing on-screen at ~3:283:44

Another very impor­tant field in which work quite a lot is the med­ical field, and we cur­rent­ly have the tech­nol­o­gy avail­able for auto­mat­ic analy­sis of pain and inten­si­ty of pain from facial expres­sions. We use this in vis­i­tor phys­io­ther­a­peu­ti­cal envi­ron­ments, but we could also use that for inten­sive care.

Two video clips showing young boys with their detected facial features outlined and bar charts below showing levels of various expressions: neutral, smile, surprise, disgust, and scream

Clip play­ing on-screen at ~3:504:07

Another impor­tant project for us is the European Commission project on work with autis­tic chil­dren, where we would like to help the kids to under­stand the facial expres­siv­i­ty of them­selves and oth­ers by using social robots with which they inter­act. These robots will have a cam­era which will watch them, and the soft­ware that will inter­pret these expres­sions and give them feed­back.

In any case, once this tech­nol­o­gy real­ly becomes mature and we can tru­ly do face and facial expres­sion analy­sis in the wild, we would be able to have a lot of appli­ca­tions, such as for exam­ple, a sys­tem for analy­sis of a nego­ti­a­tion styles or man­ag­ing styles. Or sim­ply mea­sur­ing the stress in job inter­views, in car envi­ron­ments, in enter­tain­ing envi­ron­ments, and then increase dis­tress if peo­ple find this enter­tain­ing, or decrease it in order to increase the safe­ty of the dri­ver and the patient. So that’s just to men­tion a few exam­ples.

Thank you very much for your atten­tion.

Further Reference

Maja Pantic profiles at Imperial College London, and Realeyes.

iBUG, the Intelligent Behavior Understanting Group at Imperial College London


Help Support Open Transcripts

If you found this useful or interesting, please consider supporting the project monthly at Patreon or once via Square Cash, or even just sharing the link. Thanks.