Home › Articles › Watson’s Jeopardy Appearance Showcases Human-like Speech Synthesis

Watson’s Jeopardy Appearance Showcases Human-like Speech Synthesis

By Dan Miller on February 14, 2011 • ( 0 )

I’m really looking forward to watching the TV game show Jeopardy tonight, when “Watson,” a computer that’s been fine-tuned by the scientists at IBM’s R&D lab, takes on two of the best players in the history of the game. The impending stunt – in the sense of the word that means “a difficult or unusual or dangerous feat; usually done to gain attention,” rather than “to check the growth or development of” something – has triggered thoughtful discussions about the ability of a computer to “think” or display other human-like qualities. For instance, this discussion on Quora seeks to answer the question, “Will human consciousness ever be transferrable to a computer?”

In this article in PC Magazine, Ray Kurzweil calls it, “one small step for IBM, one giant leap for computerkind.” In the article, one of the most telling quotes is, “I’ve always felt that once a computer masters a human’s level of pattern recognition and language understanding, it would inherently be far superior to a human because of this combination.” And I think, deep down inside, many people (perhaps the TV viewing population) would believe Kurzweil’s assertion.

I may be corrected, but I’m pretty sure that Watson is not using automated speech recognition to “hear” the answers (or clues). But there is a significant amount of “natural language understanding” brought to bear as the computer deals with the text input and identifies the topic, determines the meaning of the words in context and starts formulating a response from its database.

Watson’s success (even if it doesn’t “win” the Jeopardy competition) furthers the cause of “Virtual Assistants,” exemplified by Siri, Vlingo and the culmination of Google’s speech-enabled initiatives. This article, which appeared in Engadget in mid-January, has embedded video that shows how Watson answers questions during the practice round. It also gives significant background on the data processing, pattern recognition, language “understanding,” statistics (to determine confidence in an answer) and even the mechanism for pressing the button.

I believe that the life-like synthesized voice of Watson is the quality that will have the lasting impact. It sounds like a person. It is able to detect irony and puns. It even appears to have a sense of humor (although having a topic like “Chicks Dig Me” seems like th product of a bad straight man. The TV audience are more ready to accept the reality of a talking computer, just like Mr. Ed, the talking horse from TV’s “golden years” or Kit the talking car from “Night Rider.” It is no longer the product of willing suspension of disbelief. Rather, it is the product of wishful thinking as the general public comes to expect to be able to converse with computers or computing resources over mobile phones or microphones installed in their PCs.

‹ Vlingo Reveals its 2011 “Virtual Assistant” Roadmap

A “Truly Hands Free” Speech Sighting at Mobile World Congress ›

Categories: Articles