top of page

Also known as “synthetic speech” or “text-tospeech technology,” speech synthesis mimics real human voices and deploys them to various interfaces. With enough data and training, a speech synthesis system can learn the spectral frequency of anyone’s voice and produce someone’s digital voiceprint. Just three minutes of Andy Warhol’s voice recordings offered enough time for developing Resemble AI’s training data set for Netflix’s “The Andy Warhol Diaries” series voiceover. This technology will be especially useful for movies with wide, international releases; actors’ facial expressions and mouths can be reformatted to ensure local languages are correctly synchronized. MIT’s Center for Advanced Virtuality and Ukrainian startup Respeecher used speech synthesis to contribute to an “interactive documentary”; they generated Richard Nixon’s voice reciting a never-before-heard speech prepared in case the Apollo 11 mission failed. Called “In Event of Moon Disaster,” the short film won an Emmy in 2021. Another company, Synthesia, uses this technology to dub people through automated facial reanimation. The tech can also be used for nefarious acts, such as impersonating a trusted figure in an audio conversation to extract sensitive data or make a false request. In 2021, researchers at the University of Chicago’s Security, Algorithms, Networking, and Data Lab tested two publicly available, open-source speech synthesis algorithms and were able to dupe voice recognition software from Microsoft, WeChat, and Amazon to access users’ devices by artificially re-creating their voices.

Speech synthesis

KEY TRENDS

Metaverse