The other day, I posted a blog written by an Artificial Intelligence system where it generated, quite brilliantly, a series of paragraphs detailing the industries where AI is being deployed to either solve difficult, time-intensive tasks, or to produce insights into data which humans haven’t yet been able to achieve.

It is safe to say that AI has some outstanding capabilities which are just being realised, and I have no doubt in my mind that AI will become more and more pervasive in everyday life, and become the norm rather than the exception.

Typically when people say these things, they add the “but not in my lifetime”. Well with AI it will most certainly be in my lifetime – AI is influencing things in our everyday life right now. Currently, we might not fully realise what effects it is having, but it is there. Driving research, improving travel times, even down to providing tips to be more productive and accurate in our emails.

Like most things, new technologies are generally built for good, wholesome, useful purposes, but someone somewhere quickly turns these things to more nefarious purposes, and AI is no exception.

The growth of the deepfakes

For a while now, we have seen the use of deepfakes.

For those unaware of what deepfakes are, a deepfake is a video of a person in which their face or body has been digitally altered so that they appear to be someone else, typically used maliciously or to spread false information.

In order to quickly alter the subtle movements in a persons face whilst they speak in a video, an AI system tracks the muscle movements and re-draws the video pixels to reshape the mouth movements, eye twitches, and other facial expressions to give the appearance that the person in the video is saying something they never uttered.

The YouTube video below is from a 2018 BBC news editorial into deepfakes and looks at the work done in the AI space by the University of Washington.

Deepfake Obama

With such convincing video imagery, I think most people can see how deepfakes like this can, and have been used to discredit a celebrity, or a politician and how dangerous things like this can be.

In June of last year, the FBI issued a warning that they had seen a rise in cases reported to them whereby an increasing number of deepfakes were being used in video-interviews conducted by criminals in order for an unsuspecting company to employ who they think is a real person for them to discover later that they were fake, but had by then already setup user accounts, thus granting the criminals access to proprietary information, or sensitive data.

The world of deepfakes is now set to become even bigger, and probably more dangerous.

Welcome VALL-E

Microsoft have revealed this week that they have built an AI voice simulator capable of accurately imitating a person’s voice after listening to them speak for just three seconds.

The VALL-E language model was trained using 60,000 hours of English speech from 7,000 different speakers in order to synthesize “high-quality personalised speech” from any unseen speaker.

Once the artificial intelligence system has a person’s voice recording, it is able to make it sound like that person is saying anything. It is even able to imitate the original speaker’s emotional tone and acoustic environment.

A paper describing the system states – “Experiment results show that VALL-E significantly outperforms the state-of-the-art zero-shot text to speech synthesis (TTS) system in terms of speech naturalness and speaker similarity,”.

“In addition, we find VALL-E could preserve the speaker’s emotion and acoustic environment of the acoustic prompt in synthesis.”

If you want to hear how good VALL-E is, there is a demo of the speech capabilities described in the above-mentioned paper here.

The VALL-E software used to generate the fake speech is currently not available for public use, with Microsoft citing “potential risks in misuse of the model, such as spoofing voice identification or impersonating a specific speaker”

And that’s where the criminals will seek to gain their next advantage.

What’s to say that any person couldn’t now have their 3-second audio taken in any number of ways, synthesised via a system similar to VALL-E, augmented with the natural language flow of a system such as ChatGPT and have a very convincing ability to make a phone call to someone?

For many years we have seen companied tricked into making payments to fake bank accounts via an “urgent” email sent to an unsuspecting office junior purportedly from the Chief Financial Officer.

Well now there is the ability for criminals to leverage AI to have live calls to an office from the fake CFO.

People fear the weaponisation of AI in the area of armed conflict, and whilst that will most certainly happen at some point. To me, however, the near threat is the weaponisation of AI by criminal gangs to extort finances from unsuspecting companies and individuals.

What checks do companies now have to put into place to avoid being deepfaked out of millions of pounds?

AI is a wonderful space to be working in, and it offers some amazing possibilities, but we have to also understand the dangers of such technologies when it becomes weaponised.