The jury's out: language attitudes and forensic speech analysis

Image by succo from Pixabay

The study in a sentence

We are very good at recognising voices when they are familiar to us, but we're not so good at recognising voices that are unfamiliar. 

Understanding how we recognise unfamiliar voices is important, as people on a jury are often involved in the process of deciding whether a voice from a crime (e.g. in a threatening phone call) is the same as that of a suspect. Moreover, jury members' attitudes towards specific accents may also bias them towards thinking that a suspect is more or less likely to be guilty of a crime.

This study shows that the task of judging whether two samples of speech come from the same speaker or from different speakers is affected by the context in which the listener performs the task.

Image by Bruno from Pixabay 

The question

Forensic speech analysis is used in court to provide evidence about as to the likelihood that a speaker was present at the scene of a crime. Experts analyse recordings from the crime scene (e.g. a threatening phone call or a smartphone recording from a by-stander) to compare with recordings of the suspect's voice, and decide how likely it is that the two samples are from the same speaker. 

There are two main ways of performing this forensic speaker comparison task:

Neither of these methods is perfect. This study tries to understand the strengths and weaknesses of humans versus machines in the speaker recognition process.

Key concepts

Participants in the study had to make decisions about:

While these concepts are very similar to one another, in a courtroom setting there is a very important difference. Two voices might be very similar, in that they are both older male speakers from North-East England, and they might both have highly typical accents of that region, but this doesn't mean that we can confidently say that they are the same person

In a legal context it is very important to be able to make a clear assessment between similarity, typicality and sameness. 

Forensic speech analysts need to know which features are likely to be similar across different voices from the same region, and which features make it possible to decide whether the two samples are likely to have come from the same person.

To what extent are human speaker comparison judgments affected by the contextual information that occurs in forensic cases?

Image from game footage of Professor Ellis (the forensic speech analyst) 

Methods: gamification in research

The researchers designed an immersive jury-based game to elicit responses about participants' language attitudes that would otherwise be difficult to access. Language attitudes are very complex and often affected by a multitude of factors – the listener's own experience as well as the speaker's accent.

Another problem with some research studies is that it takes a long time to collect data, and so it can be very boring or tiring to participate in the research. Using a game-based design makes it fun and engaging to take part. If participants are not rushing to finish or getting too tired, they will likely provide more reliable responses. 

Importantly, the researchers looked at how responses changed once new evidence was brought into the case: were decisions more accurate after they were told that there was more evidence, or less? What do you think the result will show?

Figure 1: Participants' assessment of whether or not two voices were from the same speaker or different speakers. Bars on the left show results when the recordings were actually different voices, and bars on the right show results when the recordings were from the same voice. 'Training' bars in blue are results from the first level of the game; the other bars show decisions made when in the immersive jury context, then with additional evidence, and then with expert witness guidance.

The answer

The results found that machines were better at making accurate decisions than humans: 

Moreover, the context in which participants heard the voices affected their judgements:

Non-expert human judgements of speaker sameness can be influenced simply by the fact of being asked to judge this in a courtroom context. 

Classroom activities

Lead in task

Can you identify which two speech samples are from the same celebrity speakers?

Extension task

Salient features, the courtroom context & keeping research real

In more detail

A longer explanation of the research study

Talk Recording [37 mins including interactive task]


Slides from the Workshop Talk

Meet the authors

Carmen Llamas & Vince Hughes

Carmen is a Professor of Linguistics in the Department of Language and Linguistic Science. She teaches on modules in sociolinguistics such as Methods in Language Variation and Change.  Vince is a Senior Lecturer in Forensic Speech Science in the Department of Language and Linguistic Science. He teaches modules in Forensic Phonetics and Linguistics as Data Science.

Read about the research

Hughes, V. & Llamas, C. (2021-2023). Novel Methods for Assessing Speaker Recognition Performance. AHRC, grant AH/T012978/1.project website