top of page

Similar to fingerprints, specific vocal characteristics are unique to each person. They could be used to create an algorithmic model of the voice, known as a voiceprint, which could be used for biometric identification and authentication purposes.

Similar to fingerprints or the retina, vocal characteristics are unique to every person. Features such as the length of the vocal tract, nasal passage, pitch, accent, and even behavioral characteristics of each individual, could be used to create an algorithmic model known as a voiceprint. This system establishes a novel method of security and authentication, leading to a more personalized security strategy for biometric identification.

Although a variety of machine-learning techniques are used to process and store voiceprints, current state-of-the-art models are based on deep learning, which has shown to outperform other models on a broad range of tasks involving noisy sensory data. Models based on deep-learning are already able to reconstruct a speaker's face by merely analyzing a voice recording and using a technique called deep clustering, which disentangles multiple conversations happening at the same time in a crowded room. This methodology also reduces security threats posed by voice imitation or with conditions of sickness that might change the voice quality of the user.

Using the human voice for biometric identification holds the advantage of already being present and pre-established in communication channels, such as phone systems. Extra hardware, such as retina-scanners or fingerprint readers, would become unnecessary, thus making its implementation straightforward and cost-effective.

Such technology could be a useful resource for those who struggle with written communication, thus creating new financial syntax for the illiterate and low-literate people. This fact holds a more democratic hallmark towards the widespread use of this technology at any societal level.

Future Perspectives
With the rising trend of using spoken commands to enable resources on a variety of devices, it is possible to foresee future classrooms abdicating the need for students to copy down content presented in a lecture. This hands-free conversational software might build a more sensitive economic environment, where market connections, financial dealings, and many other transactions would be performed with no handwriting but with voice contact.

Yet, authoritarian governments could use these databases to monitor their citizens and threaten free speech. New developments in speech imitation and synthesis could recreate a voice based on samples of short speech recordings. They could then be used to trick a system into identifying a hacker as an authorized user. This could lead to an arms race between security tech and malicious third parties.

Voice Recognition

KEY TRENDS

Artificial Intelligence (AI)