"Grandpa, I need money!": the new fraud of cloning voices to steal from older relatives

Ruth Card and her husband Greg Grace received an unexpected call a few days ago: their grandson Brandon was in jail without a wallet or phone.

Oliver Thansan
Oliver Thansan
11 March 2023 Saturday 21:54
14 Reads
"Grandpa, I need money!": the new fraud of cloning voices to steal from older relatives

Ruth Card and her husband Greg Grace received an unexpected call a few days ago: their grandson Brandon was in jail without a wallet or phone. He needed cash to pay the bail they requested and be able to go free. Her grandparents did not hesitate and rushed to several banks to withdraw the amount they requested. In the first entity, they took out $2,207. In the second, they were warned that another customer had received a similar call. And they began to suspect that it was a scam.

This American couple had a hard time believing it was a hoax, because their grandson's voice was identical. Nothing made them suspicious. However, the warning in this second bank encouraged them to gather some information. They were not wrong.

They were victims of one of the many phishing scams taking place in the country, sponsored by artificial intelligence (AI).

His story has been picked up by The Washington Post, which points to a worrying uptick in these crimes. According to data from the Federal Trade Commission (FTC), consumers lost $8.8 billion to fraud of some kind in 2022, an increase of more than 30% over the previous year. The second highest amount of loss came precisely from impostor scams, with losses of $2.6 billion reported in this period, compared to $2.4 billion in 2021.

Cyber ​​attacks and cybercrime are increasing not only in the US, but throughout Europe, and are becoming more sophisticated, says the European Council (EC). This trend will continue to worsen in the future, as 41 billion devices worldwide are expected to be connected to the Internet of Things by 2025.

Everything is easier for cybercriminals and more complicated for users, thanks to the advances and democratization of AI. This technology is making it easier to mimic voices, convincing people, often the elderly, that their loved ones are in danger. "We thought we were talking to Brandon," explains this couple, in statements to this medium.

Today there are plenty of cheap online tools that can translate an audio file into a replica of a voice very convincingly. This technology based on generative artificial intelligence, only needs an audio of a few sentences. From there, they pick up the tone and are able to reproduce all the necessary phrases.

Gone are old tricks, such as vishing, a type of fraud based on social engineering and identity theft. It is done through phone calls, where the attacker impersonates a company, organization or even a trusted person, in order to obtain personal information from their victims. But it never got as accurate as these apps capable of cloning voices.

More and more companies are dedicated to developing voice generation software, in charge of analyzing their characteristic features: age, gender, accent or tone. "It's kind of a perfect storm... with all the ingredients you need to create chaos," Hany Farid, a professor of digital forensics at the University of California at Berkeley, told The Washington Post.

This specialist ensures that it is very easy to clone the voices using AI. It only requires a small audio sample of the person in question, taken from YouTube, podcasts, commercials, TikTok, Instagram or Facebook videos.

Companies emerge in this sector. One of them is ElevenLabs, which specializes in natural-sounding speech synthesis and text-to-speech software, using AI and deep learning. “The most realistic and versatile AI voice software ever. Eleven brings the most compelling, rich and realistic voices to creators and publishers looking for the best storytelling tools. It can be used for free in its paid version.

Microsoft has also jumped on the generative AI bandwagon. In January, it announced that it was developing VALL-E, a Text-to-Speech (TTS) language model capable of learning and imitating any voice based on a three-second recording. Not only does it imitate, but its developers claim that it is capable of "preserving the emotion of the speaker and the acoustic environment of the message".