Life-like text-to-speech rendering is one of those evergreen, yet elusive, opportunities for advancement in speech processing. About 5 years ago, Rhetorical (now part of Nuance) and AT&T (with Natural Voices) were able to demonstrate TTS software that synthesized spoken utterances from text and with pitch, timbre, prosody and other traits of particular speakers could be rendered in real-time. The primary objective, at the time, was to eliminate the need for businesses to hire “expensive live talent” to serve as the “voice of the enterprise” or “voice of the brand” on interactive voice response systems (IVRs).
The ensuing years have witnessed industry consolidation coupled with geographic expansion. Rhetorical was acquired by Nuance in 2004. AT&T transferred exclusive rights to resell Natural Voices TTS to Wizzard Software. Other members of the TTS community underwent similar transformation.Suffice it to say that supporting multiple voices in over two-dozen languages has become the table stakes to play in the global TTS game, with Nuance, Loquendo, Alcatel/Lucent joined by roughly six other firms in vying for market share. I’m in the process of compiling and writing an Advisory to address many of the new opportunities that Recombinant Communications concept creates for text-to-speech synthesis. In addition with the working title “Recombinant Communications Spells New Life for Text-to-Speech”.
Contact center-centric approaches can be seen as IVR enhancements. In the world of RC, a new community of developers have discovered new potential for core TTS capabilities. These days, an inordinate amount of attention is being paid to spoken input: for text messaging, for mobile search and for transcription of voice mail, Tweets and input to social sites. However, it has not taken long for the community of developers to discover that, in the “hands-free/eyes forward/location aware” world of the modern automobile, spoken output is equally important.
Text readers are a ready-made opportunity for email, newspapers and downloaded ebooks. Turn-by-turn directions, complete with street names, have always sounded disjointed or downright robotic. That’s all changing. As I’ll discuss in the forthcoming advisory, tools from the likes of Nuance and others, can help developers build a better user experience and life-like rendering of text is core. It’s the source of audible differentiation for a wide variety of solutions providers.
Categories: Articles