Mobility Driving Recombinant Communications Application Development and Adoption

With the likes of Google and AT&T Labs in the driver’s seat, efforts to assemble mobile solutions that incorporate speech into multimodal interactions are gaining both visibility and momentum. That was one of the major take-aways from two intense days at the Mobile Voice Conference, organized by Bill Meisel’s TMA Associates in conjunction with AVIOS (the Applied Voice Input/Output Society).

I was pleasantly surprised by the approach to incorporating speech into multi-modal and mobile applications that appears to be taking hold among the category leaders (like Google, AT&T, Microsoft and Nuance) as well as specialists like Novauris, Ditech, Voxeo, Siri and IfbyPhone. If there were a single take-away from the Mobile Voice conference, it is that long-time specialists in building the ideal voice user interface (VUI) have put a lot of thought and investment into promoting a results-oriented user experience, that takes into account multiple devices, modalities and media.

As Mike Cohen of Google discussed in his opening keynote, Google wants to make it clear that “whenever that keyboard pops up on a mobile device, users should know that they can also use their voice for input.” But voice is but one of many alternatives. A number of spokespeople from Nuance reinforced the message, making it clear that – although the company is widely regarded as the developer or acquire of a multiplicity of speech recognition and text-to-speech resources – the company built a number of solutions that use “predictive technologies” and visual output to speed up the processes involved in helping mobile subscribers carry out a number of activities successfully regardless of handset configuration or network used. As Amy Livingstone, Sr. Director of Enterprise Marketing explained, the company is positioning for 4G (and even 5G), which will entail “ubiquity, high speed, real-time video, co-browsing and mobile Web applications”; not just a voice user interface.

AT&T’s Jay Wilpon showcased another very important aspect of a strategy to accelerate development of multi-modal, frequently used apps. Last September, his company has bought a firm called Plusmo to bring in-house a software platform that will enable its community of developers to use high-level languages to create multi-modal applications that work across a number of mobile OSes and “mobile platforms.” It also established the first of many planned “innovation labs” in Atlanta and has launched a formal program to encourage thousands of third party developers to take advantage of its resources and reach a wide variety of mobile users. Wilpon explained that “mobile devices are the white space for speech,” yet “nobody has made a penny of profit on speech engines”, rather “its the applications!”

It’s clear that AT&T will not be alone in encouraging participation from a broader spectrum of application developers to provide solutions that include speech. What I find so encouraging is that the new generation of solutions developers are comfortable building applications that include speech *where appropriate* for input, output or both. But they are by no means IVR script writers or old-guard telephony experts. They know Web services and standards and they enjoy “gaming the IP-telephony cloud.” It’s their collective energy, imagination and expertise that are making the coming months and years so rich with new applications and possibilities.



Categories: Articles

Tags: , , ,

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.