Reflections on SpeechTek: Natural User Interfaces Go Far Beyond Speech Processing

HerTo kick off SpeechTek 2014 Conference, I participated  in an exceptionally interesting panel organized by Jibo’s Roberto Pieraccini to discuss “New Technologies Coming Soon.” In retrospect, term “coming soon” was a misnomer. Two of the panelists – Bernard Brafman of Sensory, Inc. and Brent Barbara of 3Si (Surgical Safety Solutions) – used their time to show solutions that are up and running in the here and now (or “hear and now”). In the case of Sensory, the feature product is TrulySecure™, technology that can be embedded in a smartphone to enable people to use their voice combined with facial recognition to activate the device or gain secure access to mobile services.

3si’s offering is a little different. VP of Marketing Brent Barbara demonstrated how voice recognition can be incorporated into the workflows of modern hospital operating room. The company’s core product, the 3si HUB is a control panel or dashboard, of sorts, for hospital Operating Rooms (ORs). It provides both visual and voice-based status indicators during complex surgeries, including real-time roster management (making sure the anesthesiologist, scrub nurses and other assistants are present and doing the right chores), and constantly maintaining and reviewing a checklist of pre-operative and intra-operative activities. Natural Language Processing and accurate transcription of spoken words are integrated into the solution. But the point is, both TrulySecure and the 3si HUB are available in the here and now.

The other panelist was Dr. David Nahamoo, IBM Fellow and CTO of Big Blue’s Speech Practice. After I had opined that “there have been no real breakthroughs in speech recognition in the past 8 or so years,” Dr. Nahamoo was quick to correct me in his opening remarks. To the contrary, he said, automated speech recognition had made a great leap in accuracy over the past year or so, thanks to the judicious application of Neural Networks and renewed attention to Natural Language Processing, Machine Learning and Deep Belief Networks. True to the “coming soon” label to our session, he noted that deployment of these technologies in the lab signals the start of research and development efforts that will be years in the making. The fruit of IBM’s labor will be (in Nahamoo’s words), creating “a machine that is more like a human” in terms of its ability to understand spoken input, gestures and ultimately the intent of individuals.

“We have broken the barriers to accuracy” with the use of neural networks and machine learning that benefit from Deep Belief Networks. But improvements in speech recognition and transcription can only take the community of applications developers so far. Natural Language Understanding (NLU) is the next big thing. Advancements in this domain will not  come from improvements in “core” speech processing; instead they will arise – as IBM Watson did – from applying raw computing power to understanding questions and answers based on analytics of voluminous amounts of textual data on the Internet and World Wide Web.

Providing the best answers for individual questions is one thing – something that today’s Intelligent Assistants already do pretty well, while constantly improving. “Coming Soon” will be the ability to carry on a dialog. A quality that is very important for lengthy engagements and conversational commerce. Taking turns, maintaining context and even providing editorial comments and guidance are bound to be the next big thing, especially in support of natural interfaces for self-service and assisted-self service. It may be a bitter pill for speech processing professionals to swallow, but the future of natural, comfortable, natural user interfaces for such applications is not going to be shaped by improvements in the accuracy of automated speech recognition resources (though the ability to detect speech or individual speakers in a noisy environment is another “coming soon” capability that will be vital). Instead it will be the collective advancements in NLP, dialogue management and machine learning that, when combined with speech processing technologies that are available today, are revolutionizing conversational commerce and intelligent assistance.

Join us at the Intelligent Assistants Conference-SF (Sept. 16, 2014) to learn how far we expect IA to evolve in the coming years.



Categories: Conversational Intelligence, Intelligent Assistants, Intelligent Authentication, Articles

Tags: , , , , , , ,

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.