Innovative Approaches to Fight Spoofing of Voice Authentication

As part of our ongoing coverage of Intelligent Authentication, Opus Research recently published “Voice Biometrics, What Could Go Wrong?” In it, we showed how high-grade voice biometrics solutions overcome “Spoofing” (aka “Presentation Attacks”) through Playback and Liveness Detection schemes, respectively (see figure below). These have been the mainstay for Presentation Attack detection for many years, with this year revealing the greatest improvement in performance and accuracy due to improved algorithms, machine learning and deep neural networking on extremely large data sets with the promise of assisted and automated tuning and optimization of voice biometric systems.

This holiday season, so-called smart speakers and other “Voice First” devices from Amazon, Google, Sonos, Samsung and, eventually Apple, are at the top of many people’s gift list. Their high profile has attracted the attention of security experts with a keen eye on how to prevent spoofing or mitigate presentation attacks through an ever-growing population of end-points. Existing techniques have been effective in defending against most forms of presentation attack, yet press coverage creates a sense there is a real threat to comprehensive security systems.

However, fraudsters are continually enhancing their methods with easier access to sophisticated audio processing tools. Two such tools that received a lot of media attention are Lyrebird and Adobe Voco, both not commercially available yet. It is only a matter of time before these or similar tools become available to even the least sophisticated hackers. Thankfully, the global innovation engine continues to steam ahead, with a few very exciting concepts and prototypes that promise to inject an entire new dimension into defending against spoofing attacks on voice biometrics.

Presentation Attack Vectors and Mitigation Techniques

Here’s a sampling of innovative solutions:

VoiceGesture: Designed by researchers at the Florida State University, VoiceGesture leverages the advanced audio capabilities of modern smartphones such as Galaxy S5/6, and iPhone 5/6 and upwards, to create “Doppler radar” which  emits a high frequency sound, inaudible to the human ear, from the built-in speaker and listens to reflections from various voice articulators (e.g. lips, jaw, tongue) at the microphone as users speak to create a “mouthprint.”

VAuth (pronounced “vee-auth”): Developed at the University of Michigan, VAuth can take the form of any wearable device such as a necklace, ear buds or a small attachment to eyeglasses. By continuously monitoring speech-induced vibrations on the user’s body, VAuth pairs these signals with the sound of that person’s voice to enhance the voiceprint.

Lip Motion Password: Researchers at Hong Kong Baptist University posit that lip movement is unique to an individual, and by using the front camera on smartphones, tablets and notebook PC’s, etc. their solution examines the visual features of a person’s lips including shape, texture, and movement to create a unique lip motion template, and may be easily described as an extremely highly focussed version of facial recognition with liveness detection.

VocalZoom: Israeli-based firm that uses multi-function sensors to enable voice control and voice authentication. The technology converts a speaker’s facial skin vibrations into unique, noise-free voiceprint, with the biometric acquisition and template matching performed inside the optical sensor solution. With liveness detection, the company offers a range of products for applications in smartphones, PCs, ATMs and connected cars.

As with all authentication, there are pros and cons to these innovations. One of the key tenets of Intelligent Authentication is that it fuse multiple factors both for security and convenience and that they be layered according to rules that take into account the risk associated with the individual user or the transaction that is being carried out. No single solution provides 100% prevention. With the exception of VocalZoom, the above solutions are not commercially available yet – and, arguably, still quite a way from full commercialization – yet they have the potential to significantly enhance IAuth. With additional modalities, factors and security layers for voice biometrics, a new wave of solutions are fusing both biometric and behavioral user characteristics to deliver secure, seamless authentication.



Categories: Intelligent Authentication, Articles