Deepfake audio can trick people even when they know they might be hearing an AI-generated voice – AI-powered detectors may need to step up to help people distinguish deepfakes from authentic human speech
By Jeremy Hsu
2 August 2023
Could you tell if you were listening to an AI-generated voice?
Shutterstock/fizkes
Even when people know they may be listening to AI-generated speech, it is still difficult for both English and Mandarin speakers to reliably detect a deepfake voice. That means billions of people who understand the world’s most spoken languages are potentially at risk when exposed to deepfake scams or misinformation.
Kimberly Mai at University College London and her colleagues challenged more than 500 people to identify speech deepfakes among multiple audio clips. Some clips contained the authentic voice of a female speaker reading generic sentences in either English or Mandarin, while others were deepfakes created by generative AIs trained on female voices.
Read more:
Energy-storing concrete could form foundations for solar-powered homes
Advertisement
The study participants were randomly assigned to two different possible experimental setups. One group listened to 20 voice samples in their native language and had to decide whether the clips were real or fake.
People correctly classified the deepfakes and the authentic voices about 70 per cent of the time for both the English and Mandarin voice samples. That suggests human detection of deepfakes in real life will probably be even worse because most people would not necessarily know in advance that they might be hearing AI-generated speech.
A second group was given 20 randomly chosen pairs of audio clips. Each pair featured the same sentence spoken by a human and the deepfake, and participants were asked to flag the fake. This boosted detection accuracy to more than 85 per cent – although the team acknowledged that this scenario gave the listeners an unrealistic advantage.