<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Opus Research &#187; Mobile Speech Apps</title>
	<atom:link href="http://opusresearch.net/wordpress/tag/mobile-speech-apps/feed/" rel="self" type="application/rss+xml" />
	<link>http://opusresearch.net/wordpress</link>
	<description>Analysis and Expertise on Voice Services and Recombinant Communications</description>
	<lastBuildDate>Thu, 09 Sep 2010 00:11:57 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Parakeet: Personal Triage for Mobile Speech</title>
		<link>http://opusresearch.net/wordpress/2010/09/03/parakeet-personal-triage-for-mobile-speech/</link>
		<comments>http://opusresearch.net/wordpress/2010/09/03/parakeet-personal-triage-for-mobile-speech/#comments</comments>
		<pubDate>Fri, 03 Sep 2010 21:40:12 +0000</pubDate>
		<dc:creator>Dan Miller</dc:creator>
				<category><![CDATA[CAT Scans]]></category>
		<category><![CDATA[Mobile Speech Apps]]></category>
		<category><![CDATA[user experience]]></category>

		<guid isPermaLink="false">http://opusresearch.net/wordpress/?p=3398</guid>
		<description><![CDATA[The “Science and Technology” section of this week’s issue of The Economist a feature with the title: "Correct Me If I’m Wrong…”]]></description>
			<content:encoded><![CDATA[<p><a href="http://opusresearch.net/wordpress/wp-content/uploads/2010/09/parakeet_mic.jpg"><img src="http://opusresearch.net/wordpress/wp-content/uploads/2010/09/parakeet_mic.jpg" alt="" title="parakeet_mic" width="144" height="144" class="alignright size-full wp-image-3399" /></a>The “Science and Technology” section of this week’s issue of The Economist a feature with the title: <a href="http://www.economist.com/node/16909957?story_id=16909957">“Correct Me If I’m Wrong…”</a> (may require a subscription) In it, the author describes a new user interface called Parakeet, developed by Ola Kristensson and Keith Vertanen, at the University of Cambridge’s Computer Laboratory. Its purpose is to enable mobile subscribers to use the touch screen to correct or improve upon the first-pass results of speech-to-text conversion services.</p>
<p>Parakeet does not represent any technological leap forward. The program displays text versions of utterances for which the recognition engine has the highest confidence levels while it simultaneously displays several other renderings which carry lower degrees of confidence, sort of a range of so-called “nBest” candidates. Users can then indicate their actual utterances by using their fingers on the keyboard.</p>
<p>The originator of a spoken message performs first level triage by making sure that the message that is transmitted reflects what was spoken with great accuracy. I think this should be incorporated as an option for every mobile speech application.</p>
]]></content:encoded>
			<wfw:commentRss>http://opusresearch.net/wordpress/2010/09/03/parakeet-personal-triage-for-mobile-speech/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Machine Translation Making its Presence Known</title>
		<link>http://opusresearch.net/wordpress/2010/06/20/machine-translation-making-its-presence-known/</link>
		<comments>http://opusresearch.net/wordpress/2010/06/20/machine-translation-making-its-presence-known/#comments</comments>
		<pubDate>Mon, 21 Jun 2010 00:27:44 +0000</pubDate>
		<dc:creator>Dan Miller</dc:creator>
				<category><![CDATA[CAT Scans]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Machine Translation]]></category>
		<category><![CDATA[Mobile Speech Apps]]></category>
		<category><![CDATA[Recombinant Communications]]></category>

		<guid isPermaLink="false">http://opusresearch.net/wordpress/?p=3038</guid>
		<description><![CDATA[In recent months, we've seen the community of companies offering MT taking significant steps forward. Perhaps lured by market assessments that valued global spending on "language translation and education services" at $12 billion in 2008. Realistically, that may represent a market peak, as relatively inexpensive machine translation solutions gain acceptance in a growing number of use cases where they can easily replace more expensive "human translators". ]]></description>
			<content:encoded><![CDATA[<p><a href="http://opusresearch.net/wordpress/wp-content/uploads/2010/06/logo1.png"><img src="http://opusresearch.net/wordpress/wp-content/uploads/2010/06/logo1.png" alt="" title="logo" width="144" height="45" class="alignright size-full wp-image-3047" /></a>In February, I generated a few comments from both suppliers and the general public when I posted <a href="http://opusresearch.net/wordpress/2010/02/10/googles-approach-to-real-time-translation-a-matter-of-satisficing/">this note</a>, which some readers saw as critical of Google&#8217;s Web-based efforts to support so-called speech-to-speech translation. My point at the time was to praise Google; not so much for making the effort to bring highly-accurate and reliable machine translation services to the Web &#8220;in a matter of years&#8221; but for using its Web site to make machine translation (MT) services that work adequately for an impressive number of language pairs and is accessible to anyone with access to the Web. </p>
<p>In recent months, we&#8217;ve seen the community of companies offering MT taking significant steps forward. Perhaps lured by market assessments that <a href="http://www.guardian.co.uk/education/2010/jan/14/tefl-singapore">valued global spending on &#8220;language translation and education services&#8221; at $12 billion in 2008</a>. Realistically, that may represent a market peak, as relatively inexpensive machine translation solutions gain acceptance in a growing number of use cases where they can easily replace more expensive &#8220;human translators&#8221;. </p>
<p>In addition to Google, we&#8217;ve seen a couple of impressive new services launched in the last couple of weeks. One worth noting is the Trippo™ VoiceMagix application for Apple&#8217;s iPhone. Finnish mobile application specialist Cellictica developed the app and made it through the Apple&#8217;s vetting process to introduce it through the iTune&#8217;s AppStore a few weeks ago. <a href="http://www.prnewswire.com/news-releases/cellictica-introduces-revolutionary-speech-to-speech-translator---trippo-voicemagix-94746994.html">This press release</a> describes the application and also includes a link to a short video that illustrates its functions. Cellictica participated in the Nuance Mobile Developer Program and has successfully integrated both Dragon Dictation for speech-to-text transcription of the utterances to be translated and Nuance Vocalizer for text-to-speech rendering of the translated phrases.</p>
<p>The VoiceMagix app supports spoken input in English (but text input of a total of 2 languages). It performs speech-to-speech machine translation from English into fourteen different languages, including Chinese, Dutch, French, German, Greek, Hindi, Italian, Japanese, Polish, Portuguese, Russian, Spanish, and Thai. For the other 13, it renders written results in the native characters or script. Because of my own limited range in terms of languages spoken, I can&#8217;t vouch for the accuracy of the app in many of the supported languages, but the friends and family members who are native speakers in French, Hebrew and Spanish were duly impressed with the iPhone app. According to Cellictica&#8217;s press release, &#8220;Trippo VoiceMagix also runs Android™, Windows Mobile®, Java (J2ME) and BlackBerry®, and the company is planning to make the app available to other handsets through major app stores &#8220;in the near future.&#8221;</p>
<p>Cellictica is taking a decidedly mobile approach to MT. At the higher end of the spectrum a Florida-based company called <a href="http://67.20.85.145/Home">LinguaSys</a> is noteworthy for its ability to expand MT&#8217;s wingspan by accelerating the amount of time it takes to bring additional &#8220;language pairs&#8221; in support of a broader set of multi-lingual use cases. LinguSys offers what it calls &#8220;language middleware&#8221; and its proprietary Carabao MT Engine to shorten the time it takes to integrate natural language translation of what CEO Brian Garr calls &#8220;short shelf life&#8221; interactions, a term he uses to define text chat, e-mail, web pages and documents that require rapid translation in support of business objectives.</p>
<p>Garr told us in a recent interview that the secret sauce required to expedite development of highly-accurate MT between two new languages is the use of a &#8220;hybrid&#8221; approach. As a 20+ year veteran in the MT community (having served as CTO at one of the first MT specialists, Globalink, in the 1990s) Garr had observed the long-standing schism between solutions that use &#8220;statistical language modeling&#8221; (SLM) versus the ever-popular Hidden Markhov Models (HMMs) or &#8220;rules based&#8221; models to accomplish accurate machine translation. As Garr put it, &#8220;You were either a stats guy or a rules guy.&#8221;</p>
<p>LinguaSys&#8217; Carabao engine provides highly accurate results by using both approaches. (He calls it &#8220;hybrid&#8221;. I&#8217;d call it Recombinant). Because the statistical approach doesn&#8217;t care about the &#8220;meaning&#8221; of what is said, it merely needs a large enough database or &#8220;corpus&#8221; of matched utterances to build its statistical model and make a good &#8220;first guess&#8221; at a translation. As it gains experience (and its &#8220;corpus&#8221; grows) the accuracy improves.  essentially getting a statistics-based model to &#8220;guess&#8221; at the translation (without tackling &#8220;meaning&#8221;) and then applying rules to confirm the accuracy of the original rendering.</p>
<p>As for deployment architectures, LinguaSys developed TransGen, which is its User Interface platform which enables Carabao to be put to use as part of Web services that support translation of documents, chat, email or other text input into Web sites. For mobile users, LinguaSys has also developed apps for iPhones and Android-based devices. This, in many ways, puts it in direct competition with Cellictica, for the much narrower speech-to-speech rendering market. By contrast, LinguaSys&#8217; approach conforms to the rules of a successful RC implementation where it is instantiated as standards-conformant middleware that can be integrated (or &#8220;mashed up&#8221;) with a large company&#8217;s existing workflows and business processes. </p>
<p>In conclusion, we think that MT has crossed a critical threshold in market acceptance. Prospective users understand what it does and companies like LinguaSys, Cellictica and Google are bringing solutions to market that work sufficiently well enough to build trust among users. Opus Research is adding MT to the short list of catalytic technologies that accelerate deployment of Recombinant Communications solutions and promote more efficient network-based communications and commerce.</p>
]]></content:encoded>
			<wfw:commentRss>http://opusresearch.net/wordpress/2010/06/20/machine-translation-making-its-presence-known/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>The Nuance/IBM Five-Year Plan: R&amp;D Focused on Understanding</title>
		<link>http://opusresearch.net/wordpress/2010/05/24/the-nuanceibm-five-year-plan-rd-focused-on-understanding/</link>
		<comments>http://opusresearch.net/wordpress/2010/05/24/the-nuanceibm-five-year-plan-rd-focused-on-understanding/#comments</comments>
		<pubDate>Mon, 24 May 2010 18:44:20 +0000</pubDate>
		<dc:creator>Dan Miller</dc:creator>
				<category><![CDATA[Advisories]]></category>
		<category><![CDATA[Featured Research]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[Mobile Speech Apps]]></category>
		<category><![CDATA[Nuance]]></category>
		<category><![CDATA[speech processing]]></category>

		<guid isPermaLink="false">http://opusresearch.net/wordpress/?p=2917</guid>
		<description><![CDATA[
Featured Research
The R&#038;D relationship between IBM and Nuance has reached its third stage, now that the two companies have entered a five-year joint research initiative. Their collective objective is to get to the next phase in speech processing, where person-to-machine interactions are as natural as person-to-person.
Advisories are available to registered users only. 
For more information [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://opusresearch.net/wordpress/pdfreports/adv_TwitCC_Apr15.png" align='right' HSPACE=5 vspace=5 border=1/><br />
<em>Featured Research</em><br />
The R&#038;D relationship between IBM and Nuance has reached its third stage, now that the two companies have entered a five-year joint research initiative. Their collective objective is to get to the next phase in speech processing, where person-to-machine interactions are as natural as person-to-person.</p>
<p><em>Advisories are available to registered users only.</em> </p>
<p>For more information on becoming an Opus Research client, please contact Pete Headrick (<a href="mailto:pheadrick@opusresearch.net">pheadrick@opusresearch.net</a>).</p>
<p><!--/hidethis--></p>
]]></content:encoded>
			<wfw:commentRss>http://opusresearch.net/wordpress/2010/05/24/the-nuanceibm-five-year-plan-rd-focused-on-understanding/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mobility Driving Recombinant Communications Application Development and Adoption</title>
		<link>http://opusresearch.net/wordpress/2010/04/25/mobility-driving-recombinant-communications-application-development-and-adoption/</link>
		<comments>http://opusresearch.net/wordpress/2010/04/25/mobility-driving-recombinant-communications-application-development-and-adoption/#comments</comments>
		<pubDate>Mon, 26 Apr 2010 05:52:02 +0000</pubDate>
		<dc:creator>Dan Miller</dc:creator>
				<category><![CDATA[CAT Scans]]></category>
		<category><![CDATA[AT&T]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Mobile Speech Apps]]></category>
		<category><![CDATA[Recombinant Communications]]></category>

		<guid isPermaLink="false">http://opusresearch.net/wordpress/?p=2747</guid>
		<description><![CDATA[With the likes of Google and AT&#038;T Labs in the driver's seat, efforts to assemble mobile solutions that incorporate speech into multimodal interactions are gaining both visibility and momentum. ]]></description>
			<content:encoded><![CDATA[<p>With the likes of Google and AT&#038;T Labs in the driver&#8217;s seat, efforts to assemble mobile solutions that incorporate speech into multimodal interactions are gaining both visibility and momentum. That was one of the major take-aways from two intense days at the Mobile Voice Conference, organized by Bill Meisel&#8217;s TMA Associates in conjunction with AVIOS (the Applied Voice Input/Output Society).</p>
<p>I was pleasantly surprised by the approach to incorporating speech into multi-modal and mobile applications that appears to be taking hold among the category leaders (like Google, AT&#038;T, Microsoft and Nuance) as well as specialists like Novauris, Ditech, Voxeo, Siri and IfbyPhone. If there were a single take-away from the Mobile Voice conference, it is that long-time specialists in building the ideal voice user interface (VUI) have put a lot of thought and investment into promoting a results-oriented user experience, that takes into account multiple devices, modalities and media. </p>
<p>As Mike Cohen of Google discussed in his opening keynote, Google wants to make it clear that &#8220;whenever that keyboard pops up on a mobile device, users should know that they can also use their voice for input.&#8221; But voice is but one of many alternatives. A number of spokespeople from Nuance reinforced the message, making it clear that &#8211; although the company is widely regarded as the developer or acquire of a multiplicity of speech recognition and text-to-speech resources &#8211; the company built a number of solutions that use &#8220;predictive technologies&#8221; and visual output to speed up the processes involved in helping mobile subscribers carry out a number of activities successfully regardless of handset configuration or network used. As Amy Livingstone, Sr. Director of Enterprise Marketing explained, the company is positioning for 4G (and even 5G), which will entail &#8220;ubiquity, high speed, real-time video, co-browsing and mobile Web applications&#8221;; not just a voice user interface.</p>
<p>AT&#038;T&#8217;s Jay Wilpon showcased another very important aspect of a strategy to accelerate development of multi-modal, frequently used apps. Last September, his company has bought a firm called Plusmo to bring in-house a software platform that will enable its community of developers to use high-level languages to create multi-modal applications that work across a number of mobile OSes and &#8220;mobile platforms.&#8221; It also established the first of many planned &#8220;innovation labs&#8221; in Atlanta and has launched a formal program to encourage thousands of third party developers to take advantage of its resources and reach a wide variety of mobile users. Wilpon explained that &#8220;mobile devices are the white space for speech,&#8221; yet &#8220;nobody has made a penny of profit on speech engines&#8221;, rather &#8220;its the applications!&#8221;</p>
<p>It&#8217;s clear that AT&#038;T will not be alone in encouraging participation from a broader spectrum of application developers to provide solutions that include speech. What I find so encouraging is that the new generation of solutions developers are comfortable building applications that include speech *where appropriate* for input, output or both. But they are by no means IVR script writers or old-guard telephony experts. They know Web services and standards and they enjoy &#8220;gaming the IP-telephony cloud.&#8221; It&#8217;s their collective energy, imagination and expertise that are making the coming months and years so rich with new applications and possibilities.</p>
]]></content:encoded>
			<wfw:commentRss>http://opusresearch.net/wordpress/2010/04/25/mobility-driving-recombinant-communications-application-development-and-adoption/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>RC (Recombinant Communications) Spells New Life for TTS (Text-to-Speech)</title>
		<link>http://opusresearch.net/wordpress/2010/02/25/rc-recombinant-communications-spells-new-life-for-tts-text-to-speech/</link>
		<comments>http://opusresearch.net/wordpress/2010/02/25/rc-recombinant-communications-spells-new-life-for-tts-text-to-speech/#comments</comments>
		<pubDate>Fri, 26 Feb 2010 01:15:22 +0000</pubDate>
		<dc:creator>Dan Miller</dc:creator>
				<category><![CDATA[CAT Scans]]></category>
		<category><![CDATA[Mobile Speech Apps]]></category>
		<category><![CDATA[Recombinant Communications]]></category>
		<category><![CDATA[Text-to-Speech]]></category>

		<guid isPermaLink="false">http://opusresearch.net/wordpress/?p=2404</guid>
		<description><![CDATA[Life-like text-to-speech rendering is one of those evergreen, yet elusive, opportunities for advancement in speech processing.]]></description>
			<content:encoded><![CDATA[<p><a href="http://opusresearch.net/wordpress/wp-content/uploads/2010/02/advanced_text_to_speech_1.gif"><img src="http://opusresearch.net/wordpress/wp-content/uploads/2010/02/advanced_text_to_speech_1.gif" alt="" title="advanced_text_to_speech_1" width="144" height="95" class="alignright size-full wp-image-2439" /></a>Life-like text-to-speech rendering is one of those evergreen, yet elusive, opportunities for advancement in speech processing. About 5 years ago, Rhetorical (now part of Nuance) and AT&#038;T (with Natural Voices) were able to demonstrate TTS software that synthesized spoken utterances from text and with pitch, timbre, prosody and other traits of particular speakers could be rendered in real-time. The primary objective, at the time, was to eliminate the need for businesses to hire &#8220;expensive live talent&#8221; to serve as the &#8220;voice of the enterprise&#8221; or &#8220;voice of the brand&#8221; on interactive voice response systems (IVRs). </p>
<p>The ensuing years have witnessed industry consolidation coupled with geographic expansion. Rhetorical was acquired by Nuance in 2004. AT&#038;T transferred exclusive rights to resell Natural Voices TTS to Wizzard Software. Other members of the TTS community underwent similar transformation.Suffice it to say that supporting multiple voices in over two-dozen languages has become the table stakes to play in the global TTS game, with Nuance, Loquendo, Alcatel/Lucent joined by roughly six other firms in vying for market share.  I&#8217;m in the process of compiling and writing an Advisory to address many of the new opportunities that Recombinant Communications concept creates for text-to-speech synthesis. In addition with the working title &#8220;Recombinant Communications Spells New Life for Text-to-Speech&#8221;. </p>
<p>Contact center-centric approaches can be seen as IVR enhancements. In the world of RC, a new community of developers have discovered new potential for core TTS capabilities.  These days, an inordinate amount of attention is being paid to spoken input: for text messaging, for mobile search and for transcription of voice mail, Tweets and input to social sites. However, it has not taken long for the community of developers to discover that, in the &#8220;hands-free/eyes forward/location aware&#8221; world of the modern automobile, spoken output is equally important. </p>
<p>Text readers are a ready-made opportunity for email, newspapers and downloaded ebooks. Turn-by-turn directions, complete with street names, have always sounded disjointed or downright robotic. That&#8217;s all changing. As I&#8217;ll discuss in the forthcoming advisory, tools from the likes of Nuance and others, can help developers build a better user experience and life-like rendering of text is core. It&#8217;s the source of audible differentiation for a wide variety of solutions providers.</p>
]]></content:encoded>
			<wfw:commentRss>http://opusresearch.net/wordpress/2010/02/25/rc-recombinant-communications-spells-new-life-for-tts-text-to-speech/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>It&#8217;s Official: Nuance is Buying SpinVox for $66 Million in Cash, Plus Stock</title>
		<link>http://opusresearch.net/wordpress/2009/12/30/its-official-nuance-is-buying-spinvox-for-66-million-in-cash-plus-stock/</link>
		<comments>http://opusresearch.net/wordpress/2009/12/30/its-official-nuance-is-buying-spinvox-for-66-million-in-cash-plus-stock/#comments</comments>
		<pubDate>Thu, 31 Dec 2009 00:10:32 +0000</pubDate>
		<dc:creator>Dan Miller</dc:creator>
				<category><![CDATA[CAT Scans]]></category>
		<category><![CDATA[Mobile Speech Apps]]></category>
		<category><![CDATA[Nuance]]></category>
		<category><![CDATA[Recombinant Communications]]></category>
		<category><![CDATA[SpinVox]]></category>

		<guid isPermaLink="false">http://opusresearch.net/wordpress/?p=2147</guid>
		<description><![CDATA[Nuance has acquired SpinVox to "accelarate its voice to text business." According to the release the transaction was worth $102 million with a third of that coming in Nuance common stock.]]></description>
			<content:encoded><![CDATA[<p><img src="http://opusresearch.net/wordpress/wp-content/uploads/2009/04/nuance_logo.jpg" alt="nuance_logo" title="nuance_logo" width="117" height="75" class="alignright size-full wp-image-356" />Greg Sterling provided the following commentary on Nuance&#8217;s purchase of SpinVox on the Internet2Go site:</p>
<p><img src="http://opusresearch.net/wordpress/wp-content/uploads/2009/12/spinvox_logo-150x45.png" alt="spinvox_logo" title="spinvox_logo" width="150" height="45" class="alignleft size-thumbnail wp-image-2154" />
<p>Nuance has acquired SpinVox to &quot;accelarate its voice to text business.&quot; According to the release the transaction was worth $102 million with a third of that coming in Nuance common stock. From the <a href="http://www.nuance.com/news/pressreleases/2009/20091230_acquireSpinVox.asp">release</a>:</p>
<blockquote><p>As the estimated number of operational voicemail boxes in the world has passed one billion, and consumer and corporate activity now generate over 150 billion voicemails a year, Nuance and SpinVox have experienced strong interest in voice-to-text automation. The two companies helped pioneer solutions that utilize speech recognition and transcription workflow solutions to convert voicemails into text that can be sent to users as SMS or email messages. This transaction marries innovative speech solutions and robust carrier-grade infrastructure to accelerate innovation, and deliver these voice-to-text services to global subscribers.</p>
</blockquote>
<p>And from the <a href="http://blog.spinvox.com/2009/12/30/speech-pioneers-nuance-and-spinvox-join-forces-to-advance-global-speech-technology-market/">SpinVox blog</a>:</p>
<blockquote><p>Okay, so what does this mean? Without question there is an accelerating demand from carriers, consumers and enterprises for robust speech-enabled services and automated voice-to-text platforms – in fact, SpinVox already services nearly 100 million users worldwide. With that in mind, Nuance will leverage SpinVox’s carrier-grade voice-to-text infrastructure, network product portfolio, multi-language support and experienced UK-based development teams to further drive and accelerate adoption of voice-to-text around the world.</p>
<p>This is great news for customers who will benefit from both the technology strength and superior product and services delivery. There will be more services, more applications, highly accurate voice-to-text transcription and the best delivery platform available – no matter where you are in the world! </p>
</blockquote>
<p>SpinVox a year ago in March <a href="http://www.spinvox.com/spinvox-secures-over-100-million-in-new-funding-round..html">obtained $100 million</a> in a massive funding round and more than $200 million in total. The company was <a href="http://www.theregister.co.uk/2009/07/29/spinvox_mechanical_turk/">rocked by scandal</a> when it appeared that most of the speech-to-text transcription was done by humans and not by machine, as had been claimed. </p>
<p>The company had an outstanding £30m ($48.2 million) loan that it was having difficulty repaying.  </p>
<p>While this is obviously not the outcome SpinVox envisioned a couple of years ago, Nuance picks up a valuable addition to its suite of enterprise and consumer voice applications, a range of existing clients and a large installed base of users.  </p>
<p>I had povided my thoughts on the SpinVox acquisition last week. I called it a coup for Nuance, but I also see it as an important development for incumbent telcos. Transcription of voice messages extends the life of existing voicemail platforms and is the missing link in the evolution messaging services that prove their value to end-users by going well-beyond simple &#8220;message waiting&#8221; notifications to the delivery of complete messages as email or SMS texts. This breaks down long-standing boundaries between voicemail and email, and transforms the telephones role in message origination.</p>
<p>SpinVox had taken considerable heat for its decision to involve live agents in the &#8220;disambiguation process&#8221; required for accurate rendering of spoken words. Yet, to me, this is yet another flavor of the high-tech-plus-high-touch combination that make real-time services truly useful. Both Nuance and SpinVox have placed a premium on accurately rendering voicemail messages. It was a tactical choice and a differentiator, especially against Google Voice (which is thought to be 100% automated). </p>
<p>As I say repeatedly (often with accurate rendering) 100% accuracy in transcribing voicemail is a pipe dream. Both human-aided and totally automated systems are notorious for their failures to recognize such things as street names and other proper nouns, and that situation is unlikely to improve. But this failure, in and of itself, creates the seeds for stronger bonds between people who send or receive messages from one another.</p>
<p>Admittedly, this is not like cracking &#8220;The Da Vinci Code&#8221;, but there are game like qualities to figuring out some of the messages that are received as spoken words and rendered as text messages. In most cases, the meaning comes across loud and clear. Besides, as is true with applications from Nuance (Voicemail2Text) and SpinVox (in many flavors) recipients can listen to MP3 files of the messages as attachments to the email or links to the SMS text. The value of text-based delivery is undeniable, as is the high probability that at least one of the words or phrases will be inaccurate [I'm going to address this phenomenon in a CATScan called "The end of 5 Nines... Hallalujah!"]. </p>
<p>Combine the factors mentioned above and you&#8217;ll understand why it is more important than ever for Nuance to expand the potential user base for its voicemail-to-text services. Google has gone full-speed ahead with notoriously inaccurate voicemail-to-text rendering deeply embedded in its Google Voice services. At this point, accuracy is not the issue (though it is important to be as accurate as possible); global reach is the objective. SpinVox&#8217;s contracts with global carriers is very important to both Nuance and the carriers, themselves, as they prepare to compete with Ma Google (the Search Giant as Telco).</p>
<p>It&#8217;s also a potential win for end-users. As Google has so often proven, accurate treatment (of search queries or spoken utterances) improves with volume. The combination of Nuance and SpinVox can create the critical mass of users required to result in a highly-accurate service while, at the same time, posing formidable competition to Google Voice.</p>
]]></content:encoded>
			<wfw:commentRss>http://opusresearch.net/wordpress/2009/12/30/its-official-nuance-is-buying-spinvox-for-66-million-in-cash-plus-stock/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Update: Proposed &#8220;Dot Rev&#8221; of Dragon Dictation on the iPhone Will Address Privacy Concerns</title>
		<link>http://opusresearch.net/wordpress/2009/12/10/update-proposed-dot-rev-of-dragon-dictation-on-the-iphone-will-address-privacy-concerns/</link>
		<comments>http://opusresearch.net/wordpress/2009/12/10/update-proposed-dot-rev-of-dragon-dictation-on-the-iphone-will-address-privacy-concerns/#comments</comments>
		<pubDate>Thu, 10 Dec 2009 14:01:45 +0000</pubDate>
		<dc:creator>Dan Miller</dc:creator>
				<category><![CDATA[CAT Scans]]></category>
		<category><![CDATA[Dragon]]></category>
		<category><![CDATA[Mobile Speech Apps]]></category>
		<category><![CDATA[Nuance]]></category>

		<guid isPermaLink="false">http://opusresearch.net/wordpress/?p=2020</guid>
		<description><![CDATA[One particular aspect of Nuance Communications Dragon Dictation for the iPhone has captured the imagination of the connected public, and not necessarily in a good way. In this blog post, Nuance&#8217;s Michael Thompson addresses the concerns of a group of people who question why, during installation, the new application copies and uploads all the names [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://opusresearch.net/wordpress/wp-content/uploads/2009/12/dragon_mobile_logo-150x38.png" alt="dragon_mobile_logo" title="dragon_mobile_logo" width="150" height="38" class="alignright size-thumbnail wp-image-2024" />One particular aspect of Nuance Communications Dragon Dictation for the iPhone has captured the imagination of the connected public, and not necessarily in a good way. In <a href="http://blog.dragonmobileapps.com/2009/12/what-dragon-dictation-for-iphone-does.html">this blog post</a>, Nuance&#8217;s Michael Thompson addresses the concerns of a group of people who question why, during installation, the new application copies and uploads all the names in an iPhone&#8217;s contact list. The thread of comments to the post start with concern over what Nuance intends to do with the names, but quickly branch out into a quite thorough (perhaps too thorough) critique of storage and protection of so-called &#8220;speech data.&#8221;</p>
<p>To be clear, Thompson assures the public that Nuance uploads the names for a single purpose: to improve the application&#8217;s ability to render the names inside a dictated message. From experience Nuance and its cohort of speech-to-text service providers are well aware of the difficulty of recognizing proper nouns. Therefore, the firm has opted to upload names only. They are not associated with other contact information or with the identity of the device and its owner.</p>
<p>Still, &#8220;privacy&#8221;, broadly defined remains a very sensitive matter, and a hot-button issue. Some of the specific concerns (such as the one from a &#8220;defense contractor&#8221; who needs to certify that his list of contracts is under his control or in a secure server) militate toward Nuance offering a simple &#8220;opt-out&#8221; option upon initiation of the app. We&#8217;ve learned that Nuance has already added that option for &#8220;version 1.1&#8243; which has already been proposed for expedited treatment by the AppStore gatekeepers. This &#8220;opt-out&#8221; out strategy is considered a short-term fix by Nuance. The size and scope of responses (albeit many are anonymous) is leading the company to &#8220;explore options that give users more control over what gets uploaded.&#8221; (quoting a post from Nuance Sr. Manager Nirmalya De).</p>
<p>I, personally, don&#8217;t believe that uploading contact names to improve recognition amounts to a &#8220;illegal disclosure&#8221;. At the same time, I applaud efforts to make mobile subscribers (and Web users in general) aware of the meta-data of their own creation that can be used to refine and improve services. Nuance has learned an important lesson: that wireless subscribers should control the information that they store on their mobile devices. But, in the mean time, the wireless public has made its general preference known: According to today&#8217;s <a href="http://blog.dragonmobileapps.com/2009/12/you-have-made-dragon-top-3-free-app.html">post on the Dragon Dictate Blog</a>, the app jetted to the upper echelons of the iTunes App Store&#8217;s list of downloads, achieving #3 among all free apps and #1 in the &#8220;business category.&#8221; </p>
<p>It is early days for adoption of speech enabled mobile serivces, but the public is clearly willing to test-drive (I should say &#8220;text-drive&#8221;) the latest application.</p>
]]></content:encoded>
			<wfw:commentRss>http://opusresearch.net/wordpress/2009/12/10/update-proposed-dot-rev-of-dragon-dictation-on-the-iphone-will-address-privacy-concerns/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Vlingo Boosts European Presence Via Nokia</title>
		<link>http://opusresearch.net/wordpress/2009/09/02/vlingo-boosts-european-presence-via-nokia/</link>
		<comments>http://opusresearch.net/wordpress/2009/09/02/vlingo-boosts-european-presence-via-nokia/#comments</comments>
		<pubDate>Wed, 02 Sep 2009 20:58:22 +0000</pubDate>
		<dc:creator>Dan Miller</dc:creator>
				<category><![CDATA[CAT Scans]]></category>
		<category><![CDATA[Mobile Speech Apps]]></category>
		<category><![CDATA[Nokia]]></category>
		<category><![CDATA[Recombinant Telephony]]></category>
		<category><![CDATA[Vlingo]]></category>

		<guid isPermaLink="false">http://opusresearch.net/wordpress/?p=1356</guid>
		<description><![CDATA[Mobile speech specialist, Vlingo, is making some aggressive moves into Western Europe. Versions of its flagship product are now available in UK English, German, Spanish and Italian - all downloadable from Nokia's OVI Store.]]></description>
			<content:encoded><![CDATA[<p><img src="http://opusresearch.net/wordpress/wp-content/uploads/2009/09/vlingo_logo.png" alt="vlingo_logo" title="vlingo_logo" width="140" height="52" class="alignright size-full wp-image-1371" />Mobile speech specialist Vlingo is making some aggressive moves into Western Europe. Versions of its flagship product are now available in UK English, German, Spanish and Italian &#8211; all downloadable from Nokia&#8217;s OVI Store for selected handset models. The &#8220;Basic&#8221; version of Vlingo is available as a free download from OVI. It enables mobile subscribers to use their voice to open mobile applications or features, send a limited number of text or email, find contacts and dial numbers, search the web, and create notes.</p>
<p>Following the now-famous &#8220;freemium&#8221; model (credited in Wikipedia to VC Fred Wilson, but perpetuated by Tom Evslin in his blog and Chis Anderson in his recent book &#8220;Free&#8221;), the company offers &#8220;Vlingo Plus&#8221; for a one-time &#8220;upgrade&#8221; fee of £12.99  or €14.99 (roughly $21 by today&#8217;s exchange rate) or for a monthly fee of £3.49 / €3.99 (roughly $7.70). In the UK, Germany, Italy &#038; Spain, Vlingo Plus gives users the ability to originate (by speaking) an unlimited number of text and email messages.  </p>
<p>Another breakthrough for Vlingo was revealed today when Nokia announced that the basic version of its software will be pre-loaded on two of its smartphones. Both the he Nokia E72 (which vies for the business market with the likes of the Blackberry 9630) and the recently released QWERTY-keyboard-with-slider-and-touchscreen N97 (which is vying for attention versus the iPhone, Android and Pre) will ship with Vlingo on board. Nokia N97 PR 2.0 software users will also be able to update their Facebook status by voice.  Dave Grannan, president of Vlingo, pointed out to us that wireless subscribers can upgrade to the Vlingo Plus at the touch of a button thanks to Nokia&#8217;s deployment of OpenBit licensing management software that supports multinational, multicarrier billing. Once the decision is made to upgrade, the process is essentially frictionless.</p>
<p>It is no surprise that Nokia has opted to pre-package Vlingo on the E72, as well as the N97. The application has had tremendous success among message-hungry users of RIM Blackberry. So much so that a ranking of &#8220;bestselling paid apps&#8221; that appeared in the August 31st issue of <em>Fortune</em> Magazine placed Vlingo Plus (with its $17.99 price tag) at the top of the list. There is no better testimony to the value of the voice user interface than the way mobile device owners vote with their wallets.</p>
]]></content:encoded>
			<wfw:commentRss>http://opusresearch.net/wordpress/2009/09/02/vlingo-boosts-european-presence-via-nokia/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
