CATScan XVIII: Voice Application Workflow — What a Concept!

Targeting general availability in the fourth quarter of this year, Microsoft formally rolled out a beta version of Microsoft Speech Server 2007 on May 9. Like a gaudy IPO during the Internet bubble, the roster of beta participants was grossly oversubscribed. To keep the process under control, Microsoft targeted a mix of 500 firms around the world. And although some 1,500 companies or individuals signed up, the Colossus of Redmond will stick to its initial plans of 500, according to Clint Patterson, director of marketing for the MSS product line.

In April, Microsoft sent shockwaves around the speech community by announcing that both VoiceXML and SIP — the vaunted Session Initiation Protocol for Voice over IP that makes it easier than the original NetMeeting for applications to (ahem) initiate phone calls and voice teleconferences — will be “native” to MSS 2007.

My colleague, Avery Glasser, will be issuing a commentary on Microsoft’s approach to VoiceXML clarifying Opus Research’s perspective on why Microsoft — to paraphrase Stanley Kubrick’s second title to the black-and-white classic movie “Dr. Strangelove” — stopped worrying and learned to love VoiceXML. Anybody remember the words to Kumbaya?

Someone’s Coding, Lord…
And that’s the whole point. As April gave way to May, Microsoft was on the horn to analysts and journalists to highlight the most salient features regarding tooling for MSS 2007. Like the rest of its cohort in speech application development, Microsoft has known for some time that the key to success for speech applications is the creation of truly excellent user experience. And like the best of the cadre of application development specialists, Microsoft recognizes that the user’s judgment is based on results. Those results are not based on the accuracy of core-recognition engines (though it helps), it’s based on the success rate in accomplishing specific tasks.

In other words, the success of speech-based applications is not a function of core speech-recognition technologies. Instead, it’s a function of the quality of the end-to-end application framework that makes for a truly gratifying and successful conversation, whether it’s person-to-person, person-to-machine or machine-to-machine. That’s why, when it comes to creation tools for speech applications, Microsoft’s foundation speech server product is not a “dialog designer” but rather a “dialog workflow designer” – small change grammatically, big difference thematically.

If I understand Patterson correctly there is no longer a “Speech Application SDK” for Visual Studio. Instead, there is MSS 2007-Developer Edition, an instantiation of the MSS that also includes a library of “speech activities” that, thanks to Microsoft’s WinFX framework, automatically upload to registered computers running Visual Studio. The widgets then appear as icons that give developers the ability to create new applications as easily as they would draw application flows with a visual tool like Visio.

Tools Don’t Guarantee Excellent User Experience
But they’re getting closer. There’s a reason that the core MSS 2007 announcement revolved around Dialog Workflow Design. Microsoft recognizes that, more often than not, speech-based dialog is but a small fraction of a user interaction. For that matter, it’s an even smaller fraction of a company’s life-long relationship with any customer. That’s why, in addition to the “speech activities,” a selection of “workflow activities” can be invoked as part of a conversation.

For instance, if a phone call is made to an automated system’s “change of address” application, the system can trigger a database update upon termination of the telephone call. That’s a simple example, but it represents a major step toward what Microsoft characterizes as “breaking the IVR out of its silo.”

Which Brings Us to the “Open and Closed” Case
Opus Research will have lots more to say about MSS2007 and the potential of its users to “manufacture” better and more successful user experiences. (Hint: Microsoft has made the most of its acquisition of Unveil’s intellectual property and the work of its core engineers). Taken holistically, the combination of adding VoiceXML, SIP and tooling that embraces workflow management represents a major step toward to true service-oriented architecture (SOA).

No doubt Microsoft continues to see the combination of disruptive pricing, a formidable set of tools and a skilled community of developers as a world-beating, going-forward strategy — namely to make speech mainstream. Ironically, the rest of the conversational access technologies (CAT) application development and platform community should see their prospects rising with the roll-out of MSS2007. Microsoft sees SOA through .NET-colored glasses. Indeed, the move to endorse VoiceXML, SIP and the tighter mating of speech with application workflows reflects its confidence that enterprise customers will choose end-to-end Windows-based solutions, a prospect that gets very attractive for medium-sized enterprises.

Market-wide, Microsoft’s moves may be just what the CAT doctor ordered. Like IBM’s WebSphere or BEA’s WebLogic, .NET is Microsoft’s branded SOA framework. Moving to well-defined standards grows the market for all involved in customer interactions over a variety of channels. Avaya, Oracle, HP, Cisco, Alcatel/Genesys and virtually all standards-adherent infrastructure vendors will have the opportunity to ride this gravy train.



Categories: Articles