When they say an AI machine can read your face and detect your emotions, be very sceptical indeed

NYT

When it comes to hi-tech, A/UK has always presumed that ordinary citizens and everyday communities should take as active a role as possible - rather than be flattened by an inevitable steamroller of tech-mogul defined “progress”.

A more human-centred politics of the future will be alive to the way that ever-more-subtle computation and design makes us “dividuals” rather than “individuals”, as Gilles Deleuze once put it. That is, able to be categorised and calibrated so precisely that our behaviours can be anticipated and shaped.

Usually this is about access to patterns of data collected on our behaviours. But on occasion, we have enough expert knowledge to challenge the very functioning of the technology itself.

Emotion recognition scanners, and their supposed ability to reliably detect the inner emotions and motivations of humans, are open to exactly this challenge. There is a very sharp debate in cognitive science about the reliability of these scanners - which should be a lot more widely known (for an overview, see this Nature piece).

This post is triggered by news from ArsTechnica about new AI software - titled Headroom - which aims to make online meetings go more efficiently. One way they do it is for the AI to try and measure the emotions on the faces (and voices) in the room:

Headroom’s software uses emotion recognition to take the temperature of the room periodically and to gauge how much attention participants are paying to whoever’s speaking. Those metrics are displayed in a window on-screen, designed mostly to give the speaker real-time feedback that can sometimes disappear in the virtual context.

“If five minutes ago everyone was super into what I'm saying and now they're not, maybe I should think about shutting up,” says Green.

Emotion recognition is still a nascent field of AI. “The goal is to basically try to map the facial expressions as captured by facial landmarks: the rise of the eyebrow, the shape of the mouth, the opening of the pupils,” says Rabinovich.

Each of these facial movements can be represented as data, which in theory can then be translated into an emotion: happy, sad, bored, confused.

In practice, the process is rarely so straightforward. Emotion recognition software has a history of mislabeling people of colour. One program, used by airport security, overestimated how often Black men showed negative emotions, like “anger.”

Affective computing also fails to take cultural cues into context, like whether someone is averting their eyes out of respect, shame, or shyness.

For Headroom’s purposes, Rabinovich argues that these inaccuracies aren’t as important. “We care less if you're happy or super happy, so long that we're able to tell if you're involved,” says Rabinovich.

But Alice Xiang, the head of fairness, transparency, and accountability research at the Partnership on AI, says even basic facial recognition still has problems—like failing to detect when Asian individuals have their eyes open—because it is often trained on white faces.

“If you have smaller eyes, or hooded eyes, it might be the case that the facial recognition concludes you are constantly looking down or closing your eyes when you’re not,” says Xiang. These sorts of disparities can have real-world consequences as facial recognition software gains more widespread use in the workplace.

Headroom is not the first to bring such software into the office. HireVue, a recruiting technology firm, recently introduced an emotion recognition software that suggests a job candidate's "employability," based on factors like facial movements and speaking voice.

Constance Hadley, a researcher at Boston University’s Questrom School of Business, says that gathering data on people’s behaviour during meetings can reveal what is and isn’t working within that setup, which could be useful for employers and employees alike. But when people know their behaviour is being monitored, it can change how they act in unintended ways.

“If the monitoring is used to understand patterns as they exist, that’s great,” says Hadley. “But if it’s used to incentivize certain types of behaviour, then it can end up triggering dysfunctional behaviour.”

In Hadley’s classes, when students know that 25 per cent of the grade is participation, students raise their hands more often, but they don’t necessarily say more interesting things.

When Green and Rabinovich demonstrated their software to me, I found myself raising my eyebrows, widening my eyes, and grinning maniacally to change my levels of perceived emotion.

More here. The performance loops that individuals - or are they now dividuals? - get caught in here have a nightmarish, dystopian SF quality… never mind the racial bias that gets reinforced by the software’s crudities.

But as the excerpt above hints at, there are real problems in the science of reading emotions on faces. For some experts, the facial markers for emotions accepted by psychological orthodoxy are inaccurate. Not just in terms of cultural difference - facial expressions meaning different things for different parts of the world - but also in terms of fundamental neuroscience.

Our emotions aren’t deep forces within us that erupt into inescapable smiles or grimaces - but constructed anew every time, and capable of great subtlety and variation of human expression. Certainly nothing reliable that a machine could pick up.

This is the position of Lisa Feldman Barrett, an iconoclastic mind scientist who pushes for a “constructionist” theory of emotions, as opposed to an “essentialist” one. She makes strong claims that emotion detection machines, if they run on essentialist principles, will make some very bad decisions.

In this interview for MIT Tech Review, Barrett says that:

There is no technology that I know of that can read emotions in people's faces or voices or anything else…The best technology that's available, let's say for faces can under ideal laboratory conditions can do really well at detecting facial movements, but not necessarily what those movements mean in a psychological way and not necessarily like what the person will do next or what they're good at in their job or how honest they are or any of those things.

[The percentage is not] high enough that you would ever want your outcomes or your children's outcomes decided by an algorithm that had 30% reliability. You just wouldn't. Right?

And also people scowl when they're not angry, quite frequently. They scowl when they're thinking really hard and concentrating... scowl when they're confused, they scowl when you tell them a bad joke, they scowl when they have gas. 

The traditional question asked have been: Do people move their faces in universal ways when they're angry or when they're afraid or when they're happy. And do they recognize certain facial configurations as expressions of emotion in a universal way?

We read studies about adults in large urban cultures. We read studies about adults in remote small scale cultures. We read studies about infants, about fetuses, about young children. Virtual agents, like how virtual agents are programmed to portray emotion and how emotion is perceived in these agents to allow, for cooperation or,  competition or so on.

We actually also looked at research on expressions in people who are congenitally blind and congenitally deaf. And we started looking at expressions in people who were struggling with mental illness. The thing that's important to point out is that the findings were really consistent, across the different literatures. We basically kept discovering the same thing, the same pattern kind of over and over and over.

It turns out our brains are guessing at the meaning of facial movements in this context. So exactly the same smile can mean something very, very different depending on the context.

We assume our brains read emotions from prototypes - a frown means you’re sad and a smile means you’re happy. And this simplistic view of how emotion recognition works? It’s basically wrong.  

And it's actually in our language. We talk about reading each other and reading body language and everything we know in science tells us this is not how brains work. Your brain is just guessing. It's guessing, guessing, guessing, guessing, guessing.

And it's bringing all of its experience to bear. Making a guess about, well, what is a curl of a lip or a raise of an eyebrow mean in this particular situation?

What all of this means is that much about the way we currently approach emotion AI needs to change.

Because really what brains are doing is they're constructing categories on the fly - they're not detecting categories. They're actually making them, they're constantly asking how is what I'm seeing, hearing, tasting, smelling, you know, similar to other things in my past.

If we want to build technology that “reads” or just like infers really well what a physical movement means, then we have to be studying things really differently than we are right now.

Cause right now we study one signal at a time or maybe two or if we’re really, really complicated. We do three, like maybe we do the voice and the body and the face like wow, right? Or maybe we get heart rate, and the body and the face, or whatever.

More here. Barrett and a range of other scholars are gathered together in this 2020 Nature article, Why faces don’t always tell the truth about feelings (subtitle: “Although AI companies market software for recognising emotions in faces, psychologists debate whether expressions can be read so easily”).

So: beware AI’s claiming to be able to read the mirror of your soul…