The Challenge Ahead for Mobile Speech

By Dan Miller on September 23, 2009

opus_twit I read this article by Vlingo’s CEO Dave Grannan with great interest. In it, Grannan urges us to take stock of all the great, innovative mobile apps that have made it to mobile subscribers, thanks to the existence of an “open ecosystem” and a “friction-free” marketplace. He notes that, in the ’90s, Europe and the Pacific Rim were the places where Wireless 1.0 took shape, thanks to the seamless footprint of GSM and greater levels of competition among wireless carriers.

Nowhere does he note “Net Neutrality” by name, but he suggests that wireless service providers would do well to move beyond their closed, “walled garden” models for managing the the services that they deliver to subscribers. He points to the virtuous situation created when a posture of openness encourages innovation among providers of applications and services. He contrasts the online environment that existed in the mid-1990s (dominated by AOL, Compuserve and Prodigy) with today’s World Wide Web and the abundance of services and applications that are available to users through Web browsers.

Openness and Innovation will go hand in hand. In the spirit of Recombinant Telephony, they will also encourage creative assembly of code and communications capacity that result in truly useful services. As Grannan notes in his article, “Personally I think about the combination of Vlingo, Skyhook and a search engine that would make it possible for a consumer using any device on any network to speak the words “movie listings” into his phone and automatically get show times for the theater nearest him [or her].”

This was, in no way, an advertisement for Vlingo. Rather it was a use case that demonstrates how automated speech processing on mobile devices must, in most instances, is just one element – albeit very important in a mobile or in-vehicle context – of a total solution. As I note in the “Key Findings” section of our “Mobile Speech” Report wireless subscribers measure their level of satisfaction with any given wireless application based on the success they experience with task completion.

The accuracy of a speech recognition engine or the system’s ability to render text as spoken words is not as important as, in the case of the movie finder, the ability of the service to locate me; identify what’s nearby; take my interests, preferences and (perhaps) past purchase behavior into account to make rational recommendations and selections. One of the most disheartening elements of research to prepare the “Mobile Speech” report was to download and test the many speech-enabled applications for the iPhone. Of thirty-or-so applications supporting voice dialing, search, navigation, planning and reminders, messaging and the like, only three or four were what I would call “successful” at accomplishing their assigned tasks. The rest were pretty much trash.

Many so-called Voice Dialers recognized names, but did not dial a number. Rather they stopped by displaying an entry in the phone’s contact list that corresponded to the utterance, as if that were enough. A business locator application prompted me to enter a city name, state name and business category, but never the actual name of a business. A message dictation application displayed a rendering of what it recognized me saying, but provided no mechanism for me for edits- such as correcting spelling, capitalizing proper nouns or inserting punctuation. In each case, the speech recognition resources worked within spec (better than 90%!), but failed to help me finish what I was trying to do.

My point is that “openness”, such as it is, has attracted a slew of innovators, but it is not always a good thing. I welcome fresh blood into the mobile application ecosystem. I encourage them to do more with spoken words as an element of the user interface. But I encourage everyone to put themselves in the shoes of the end-user and make sure that the product of innovation, openness and cooperation is the ability for wireless subscribers to achieve task completion. All else amounts to “stupid phone tricks”.

From a financial perspective, today’s approach could be damning. We have a forecasting model for mobile speech applications that takes into account licensing (for downloads), subscription (for related services) and “activity-based” (E.g. per message or percentage of purchases) revenues. By our estimate, revenues for 2009 will come in at less than $50 million for all three categories. Given uncertainties around packaging, pricing and revenue splits the future is wide open, but we forecast a “most likely” scenario of a less than stellar $250 million.

Future performance will far exceed this modest sum when speech is well-integrated into multimodal applications that span messaging, navigation and commerce. For instance, in-vehicle opportunities are only only beginning to be exploited, amid the rightful safety concerns surrounding texting. Add voice-based commerce, news reading and other directory services and the billion dollar opportunity begins to make itself manifest.

We, at Opus Research, encourage your comments on our posts.

‹ Gold Systems Makes Microsoft’s OCS More Social

There’s a Ford in Mobile Speech’s Future ›

Categories: Articles