Phone calls can be recorded by a variety of “spyware” applications. Home gadgets like Amazon’s Echo can also capture daily chats. A new technique known as Neural Voice Camouflage provides protection. As you speak, it makes bespoke audio noises in the background, confounding the artificial intelligence (AI) that transduces our recorded speech.
The new technology uses an adversarial attack, in which machine learning algorithms look for patterns in data to change sounds in such a way that an AI, but not humans, will mistake them for something different. In essence, you use one AI to deceive another.
When you need to hide in real time, the machine-learning AI needs to organize the entire sound clip before understanding how to adjust it. In a recent study, experts trained a neural network modeled by the brain (a form of artificial intelligence) to predict the future efficiently. They trained it on hours of recorded speech so that it could continually analyze 2-second audio snippets and conceal what’s likely to be said next.
When somebody says something like “enjoy the big feast,” it is difficult to predict what will follow. But by considering what has recently been spoken as well as the features of the speaker’s voice, it generates noises that interrupt a variety of alternative words that may follow. That covers what happened afterward, as stated by the same speaker, “that’s being prepared.” Humans and machines perceive audio camouflage differently: humans don’t hear it as background noise and machines do make mistakes when identifying spoken words from hidden voices.
When speech is disrupted by white noise, a competitive adversarial attack, or Neural Voice Camouflage, the error margin stays at 52.5 percent. Short words, like “the,” are difficult to break in general, but they are the least informative elements of dialogue.
In a recent study conducted by Columbia University, computer engineers tested the technology in the real world by playing a speech recording paired with camouflage over a set of speakers in the same area as a microphone. They found that it was still functional.
Chiquier explains that the program’s predictive component has a lot of potential for other uses that require real-time processing, such as driverless automobiles. Brains also function through anticipation; you feel surprised when your brain guesses something incorrectly. “We’re replicating the way people do things,” Chiquier explains in this regard.