Home › Articles › New Attention to Mobile Voice Control (Thanks Apple and Om!)

New Attention to Mobile Voice Control (Thanks Apple and Om!)

By Dan Miller on July 8, 2009 • ( 1 )

Because I have been an analyst in the automated speech domain for almost 20 years, I’m all for the introduction of interest among a fresh crop of knowledgeable proponents of new speech-enabled or “voice command” services. It was no mean feat for Apple to include Voice Control as one of the top features for the iPhone 3GS. It made voice input for command/control and dictation a peer of improved video, cut-and-paste, global search and other “most wanted” services.

Now GigaOm features the opinion of guest columnist Phil Hendrix, Founder and Director or the Institute for Mobile Markets Research. Somewhere behind the new GigaOm Pro’s paywall is a report from immr that professes to explain “How Speech Technologies Will Transform Mobile Use.” It’s a topic that’s near and dear to my heart; but at this point I might want to frame it as “How Mobile Use Will Transform Speech Technologies.”

1) Make the option for voice input ubiquitous – We’re getting there. The iPhone 3GS is a high-visibility device, but preloaded or downloaded apps that support speech input are already on hundreds of millions of handsets around the world. In addition, dial-up services for voice-enabled search (like GOOG411 or Microsoft/Tellme’s Bing411) have the potential to be a speed-dial button away when they add new territories and languages.

2) Do a better job of managing user expectations of accuracy – speech rec will never be 100% accurate. Think of all the times that you don’t understand what a person is saying to you on the first try. Why should wireless users expect machines to be any better. It’s important upfront and by multiple channels (demonstrations, video tutorials, Peer2Peer discussions), to make it clear that systems make best efforts to capture what is being said, but may not always “understand” meaning. It may seem like a hard logic to follow, but it is vital for maintaining the technology’s potential and the caller’s low blood pressure.

3) Showcase and support new use cases and tutorials – I think that we’ve only begun to present wireless subscribers with the sorts of options that put the technology in its best light. People love to break these things. It was in this post, that I pointed to the NYTimes technology blog in which David Gallagher logged the many transcription mistakes boldly displayed by Google Voice’s voicemail-to-text transcription.

4) Never make spoken input the only option – Note that I said make the “option” ubiquitous. Under no circumstances should an application require spoken input. Google understands this. So does Nuance, often regarded as a roll-up of speech processing technologies, but really a purveyor of platforms that accept (and in many cases predict) user input in the form of text, spoken words and short codes.

My problem with the NYTimes column and other vehicles for criticism is this: they are largely “descriptive” and not “prescriptive.” Instead of showing how easily broken the new interface is, how about demonstrating what it does quite well. It captures utterances and, like a marginally good stenographer, presents a “rough draft” of what was said, along with some notation of where the input might be “iffy.” Speakers should never think of this as a finished product. At best it should provide a form of talking triage – showing the speaker, or recipient where the transcription may need a bit of work or sculpting.

At base it’s an opportunity to game the system. If people like to make a game of breaking the application (by reading Jabberwocky to Google Voice, for instance), it makes more sense to cast yourself in the role of hero and fix what’s broken in a poorly transcribed message. If voice control is to be “the sound of things to come” for mobile subscribers, it requires a much more concerted effort than the shallow effort to integrate with the iPhone or the one-and-done guessing game that Google Voice plays. Handsfree dialing and text input through a Bluetooth headset or earbud would be a nice start. Well-prompted and choreographed messaging – be it addressing and dictation of SMS or friendly prompts to help originators or recipients triage transcribed voice mail – would be a cool way to start.

Mobile voice is a big area of opportunity. IBM had been working on it for years. Nuance, Google, Vlingo, Microsoft/Tellme, Spinvox, PhoneTag, Yap and a few others are fielding some pretty impressive products and services. Adoption will be no accident and, even though its been years in the making, it is still early days in product and market development. In other words: The Customer Has Not Spoken.

‹ IBM Touts Its Portal and Mashups

Aura: Avaya’s New Architecture for Multimodal Self-Service and Routing ›

Categories: Articles