NYTimes Column Highlights Google Voice’s Transcription Shortcomings

Picture 3This article by David Gallagher in the NYTimes’ “Personal Technology” section is a taste of things to come as phone-based transcription goes mainstream. Using the voicemail transcription function form Google Voice, Gallagher has provided a very visible venue for people to do what they like to do best, “break the system.” We’ve seen this before, people reading Lewis Carroll’s Jabberwocky into a “natural language” speech recognizer, just to see what comes out the other end.

In his introduction, Gallagher acknowledges that Google calls the service “experimental” and explains that the purpose of the column is to see how far Google has “pushed the limits” of the technology in order to provide accurate renderings or voicemail. His conclusion, stated up front, is that his callers didn’t have to push the limits very far “before it broke.”

Google’s approach uses different values on a gray scale to illustrate the system’s level of confidence in its recognizer. Bolded characters show high levels of confidence, light gray indicates that the system is not so sure of the quality of its rendering. That fact is not captured in Gallagher’s column. All the same, I’m not sure what value the different gradations bring, except that it might establish a new convention for displaying results (perhaps including search results) according to an algorithm that illustrates the level of “trust” a recipient ought to place in them.

If there is good news to be found in the public exposure of automated transcription’s shortcomings, it is the role such columns play in establishing end-user expectation. Acceptance of voice transcription technology (or lack thereof) is following a well-established pattern. Although, it is not following a hype curve or crossing a chasm, the path is predictable, nonetheless. The first step involves trying to break the new technology. Once they’ve established their mastery, the next step is to figure out how to “game” the system, meaning that they will discover and define how to make the system work for them. In the long-run that’s how they make it their own, personalized service.

From a product marketing point of view, the mission is to manage end-user expectations. Let’s be honest. Speech recognition and transcription that is 100% accurate is a pipe dream. As a matter of fact, accuracy in understanding a spoken utterance is incalculable. Failure to recognize a single word in a voice mail (a proper name perhaps) can distort the meaning of the entire message. Statistically, what may be 99% correct rendering on a word-by-word basis but 100% failure at recognizing overall meaning.

The problems with accuracy are not the exclusive domain of Google Voice and its “fully-automated” approach. Other, human-assisted, services – from SpinVox, PhoneTag, Nuance and others – should play up the game-like aspects, rather than the accuracy of their services.

Most importantly, voicemail-to-text introduces a level of convenience for recipients. It enables them to read (in silence) the system’s “first pass” at deriving the meaning of an inbound message. They can save, forward, respond or perform any other function they might do on an email or text message, including editing. That’s a key concept and I certainly hope that service providers market the service based on the promise of convenience (and even entertainment value), rather than accuracy.



Categories: Articles

Tags: , ,

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.