The jury's out: language attitudes and forensic speech analysis
The study in a sentence
We are very good at recognising voices when they are familiar to us, but we're not so good at recognising voices that are unfamiliar.
Understanding how we recognise unfamiliar voices is important, as people on a jury are often involved in the process of deciding whether a voice from a crime (e.g. in a threatening phone call) is the same as that of a suspect. Moreover, jury members' attitudes towards specific accents may also bias them towards thinking that a suspect is more or less likely to be guilty of a crime.
This study shows that the task of judging whether two samples of speech come from the same speaker or from different speakers is affected by the context in which the listener performs the task.
The question
Forensic speech analysis is used in court to provide evidence about as to the likelihood that a speaker was present at the scene of a crime. Experts analyse recordings from the crime scene (e.g. a threatening phone call or a smartphone recording from a by-stander) to compare with recordings of the suspect's voice, and decide how likely it is that the two samples are from the same speaker.
There are two main ways of performing this forensic speaker comparison task:
Linguistic-phonetic methods, where analysts use auditory and acoustic methods to analyse the pitch of the speaker's voice, their voice quality, and the details of the consonants and vowels in their speech.
Automatic speaker recognition methods, where computers analyse the speech automatically using specially-trained computer software.
Neither of these methods is perfect. This study tries to understand the strengths and weaknesses of humans versus machines in the speaker recognition process.
Key concepts
Participants in the study had to make decisions about:
speaker similarity – how similar two voices are
speaker typicality – how typical of the accent the voice sounded
speaker sameness – the likelihood that two voices are the same speaker
While these concepts are very similar to one another, in a courtroom setting there is a very important difference. Two voices might be very similar, in that they are both older male speakers from North-East England, and they might both have highly typical accents of that region, but this doesn't mean that we can confidently say that they are the same person.
In a legal context it is very important to be able to make a clear assessment between similarity, typicality and sameness.
Forensic speech analysts need to know which features are likely to be similar across different voices from the same region, and which features make it possible to decide whether the two samples are likely to have come from the same person.
To what extent are human speaker comparison judgments affected by the contextual information that occurs in forensic cases?
Methods: gamification in research
The researchers designed an immersive jury-based game to elicit responses about participants' language attitudes that would otherwise be difficult to access. Language attitudes are very complex and often affected by a multitude of factors – the listener's own experience as well as the speaker's accent.
It is also difficult to design experiments that are high in ecological validity (that is, which reflect real life language use). By replicating a courtroom context in a game, this study realistically tested the kinds of questions that a jury would be asked.
Another problem with some research studies is that it takes a long time to collect data, and so it can be very boring or tiring to participate in the research. Using a game-based design makes it fun and engaging to take part. If participants are not rushing to finish or getting too tired, they will likely provide more reliable responses.
In this game, participants had to decide how similar two pre-recorded voices were, then judge the likelihood that they were from the same speaker.
Sometimes the two recordings were from the same speaker and sometimes from different speakers.
Importantly, the researchers looked at how responses changed once new evidence was brought into the case: were decisions more accurate after they were told that there was more evidence, or less? What do you think the result will show?
The answer
The results found that machines were better at making accurate decisions than humans:
Humans achieved a result of 76.5% accuracy; people find pairs of voices hardest when they belong to the same speaker.
Specialist software achieved a result of 89.1% accuracy.
Moreover, the context in which participants heard the voices affected their judgements:
Listeners heard all pairs of voices as being more similar in the jury context compared with an earlier training level.
Once additional evidence was brought to the participants' attention (jury perspective, DNA, footprint or fingerprint evidence, or expert evidence), they were more likely to assess the voices as being the same when they were the same speaker, and when they were different speakers
Non-expert human judgements of speaker sameness can be influenced simply by the fact of being asked to judge this in a courtroom context.
Classroom activities
In more detail
A longer explanation of the research study
Talk Recording [37 mins including interactive task]
Slides from the Workshop Talk