<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Opus Research &#187; speech processing</title>
	<atom:link href="http://opusresearch.net/wordpress/tag/speech-processing/feed/" rel="self" type="application/rss+xml" />
	<link>http://opusresearch.net/wordpress</link>
	<description>Analysis and Expertise on Voice Services and Recombinant Communications</description>
	<lastBuildDate>Thu, 29 Jul 2010 23:57:39 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Nuance and MetaSwitch Offer &#8220;Carrier Optimized&#8221; Voicemail-To-Text</title>
		<link>http://opusresearch.net/wordpress/2010/06/22/nuance-and-metaswitch-offer-carrier-optimized-voicemail-to-text/</link>
		<comments>http://opusresearch.net/wordpress/2010/06/22/nuance-and-metaswitch-offer-carrier-optimized-voicemail-to-text/#comments</comments>
		<pubDate>Tue, 22 Jun 2010 18:22:03 +0000</pubDate>
		<dc:creator>Dan Miller</dc:creator>
				<category><![CDATA[CAT Scans]]></category>
		<category><![CDATA[Carrier Services]]></category>
		<category><![CDATA[Nuance]]></category>
		<category><![CDATA[speech processing]]></category>
		<category><![CDATA[voicemail-to-text]]></category>

		<guid isPermaLink="false">http://opusresearch.net/wordpress/?p=3052</guid>
		<description><![CDATA[Today MetaSwitch and Nuance announced that ivoicemail-to-text software from Nuance is tightly integrated into MetaSwitches flagship CommPortal platform.]]></description>
			<content:encoded><![CDATA[<p><a href="http://opusresearch.net/wordpress/wp-content/uploads/2010/06/MetaswitchNetworks_logo.png"><img src="http://opusresearch.net/wordpress/wp-content/uploads/2010/06/MetaswitchNetworks_logo.png" alt="" title="MetaswitchNetworks_logo" width="144" height="34" class="alignright size-full wp-image-3057" /></a>Today MetaSwitch and Nuance announced that voicemail-to-text software from Nuance is tightly integrated into MetaSwitches flagship CommPortal platform. Thus MetaSwitch continues its transformation from a sleepy softswitch provider with about $140 million in revenues to the &#8220;network application engine that can&#8221;, with ambitious growth targeted through the addition of new partners and capabilities. </p>
<p>While announcements in the domain of network devices often fall into the &#8220;telco plumbing&#8221; category, this one is significant because it signals that transcription and delivery of spoken messages will be more routinely integrated into the product offerings of the &#8220;multi-screen&#8221; (TV, PC and mobile phone) carriers. This <a href="http://www.metaswitch.com/news/metaswitch-enhances-unified-messaging-with-speech-to-text.aspx">press release </a>surrounding the Nuance partnership includes a testimonial from the AVP of Commercial Products at Frontier Communications, welcoming the enhancement to the carrier&#8217;s existing voicemail facilities. Indeed, the major purpose of the partnership is to make it easy for MetaSwitch&#8217;s existing customers to add a popular, revenue producing feature.</p>
<p>MetaSwitch claims about 500 customers in North America alone. The list includes OEMs like Cisco, Alcatel-Lucent, Motorola, but really focuses on all manner of carriers (with publicly announced customers listed <a href="http://www.metaswitch.com/company/carrier-customer-list.aspx">here</a>). </p>
]]></content:encoded>
			<wfw:commentRss>http://opusresearch.net/wordpress/2010/06/22/nuance-and-metaswitch-offer-carrier-optimized-voicemail-to-text/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Nuance/IBM Five-Year Plan: R&amp;D Focused on Understanding</title>
		<link>http://opusresearch.net/wordpress/2010/05/24/the-nuanceibm-five-year-plan-rd-focused-on-understanding/</link>
		<comments>http://opusresearch.net/wordpress/2010/05/24/the-nuanceibm-five-year-plan-rd-focused-on-understanding/#comments</comments>
		<pubDate>Mon, 24 May 2010 18:44:20 +0000</pubDate>
		<dc:creator>Dan Miller</dc:creator>
				<category><![CDATA[Advisories]]></category>
		<category><![CDATA[Featured Research]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[Mobile Speech Apps]]></category>
		<category><![CDATA[Nuance]]></category>
		<category><![CDATA[speech processing]]></category>

		<guid isPermaLink="false">http://opusresearch.net/wordpress/?p=2917</guid>
		<description><![CDATA[
Featured Research
The R&#038;D relationship between IBM and Nuance has reached its third stage, now that the two companies have entered a five-year joint research initiative. Their collective objective is to get to the next phase in speech processing, where person-to-machine interactions are as natural as person-to-person.
Advisories are available to registered users only. 
For more information [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://opusresearch.net/wordpress/pdfreports/adv_TwitCC_Apr15.png" align='right' HSPACE=5 vspace=5 border=1/><br />
<em>Featured Research</em><br />
The R&#038;D relationship between IBM and Nuance has reached its third stage, now that the two companies have entered a five-year joint research initiative. Their collective objective is to get to the next phase in speech processing, where person-to-machine interactions are as natural as person-to-person.</p>
<p><em>Advisories are available to registered users only.</em> </p>
<p>For more information on becoming an Opus Research client, please contact Pete Headrick (<a href="mailto:pheadrick@opusresearch.net">pheadrick@opusresearch.net</a>).</p>
<p><!--/hidethis--></p>
]]></content:encoded>
			<wfw:commentRss>http://opusresearch.net/wordpress/2010/05/24/the-nuanceibm-five-year-plan-rd-focused-on-understanding/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Apple&#8217;s &#8220;Audio UI&#8221; has Many &#8220;Speakable&#8221; Strings and Tiers for Controlling iPods and Such</title>
		<link>http://opusresearch.net/wordpress/2010/03/22/apples-audio-ui-has-many-speakable-strings-and-tiers-for-controlling-ipods-and-such/</link>
		<comments>http://opusresearch.net/wordpress/2010/03/22/apples-audio-ui-has-many-speakable-strings-and-tiers-for-controlling-ipods-and-such/#comments</comments>
		<pubDate>Mon, 22 Mar 2010 18:22:13 +0000</pubDate>
		<dc:creator>Dan Miller</dc:creator>
				<category><![CDATA[CAT Scans]]></category>
		<category><![CDATA[Apple]]></category>
		<category><![CDATA[iPhone]]></category>
		<category><![CDATA[Recombinant Communications]]></category>
		<category><![CDATA[speech processing]]></category>

		<guid isPermaLink="false">http://opusresearch.net/wordpress/?p=2589</guid>
		<description><![CDATA[A friend pointed me to this patent filing from Apple which shows that the company has big plans for spoken I/O for iPods, iPhones and Apple TV.]]></description>
			<content:encoded><![CDATA[<p>A friend pointed me to <a href="http://www.patentlyapple.com/patently-apple/2010/03/apples-rd-advances-audio-ui-and-new-portable-media-dock.html">this patent filing from Apple</a> which shows that the company has big plans for spoken I/O for iPods, iPhones and Apple TV (although I think the iPad must figure into the formula as well). A key concept in the patent is &#8220;multitiered approach to speech recognition that takes into account the &#8220;context&#8221; of a spoken word in order to arrive at the &#8220;focus&#8221; or the utterance.</p>
<p>Conversely, looking at spoken output, the patent filing describes &#8220;speakable strings&#8221; which are employed to provide audio feedback associated with the media that is being displayed or played. </p>
<p>The patent filing is clearly a product of inventors in Apple&#8217;s labs, so the concepts are presented at a fairly high level. Still it shows how thoughtfully speech will be baked into Apple&#8217;s audio user interface.</p>
]]></content:encoded>
			<wfw:commentRss>http://opusresearch.net/wordpress/2010/03/22/apples-audio-ui-has-many-speakable-strings-and-tiers-for-controlling-ipods-and-such/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Safe Driving: Another Speechable Moment</title>
		<link>http://opusresearch.net/wordpress/2010/03/10/safe-driving-another-speechable-moment/</link>
		<comments>http://opusresearch.net/wordpress/2010/03/10/safe-driving-another-speechable-moment/#comments</comments>
		<pubDate>Thu, 11 Mar 2010 00:44:04 +0000</pubDate>
		<dc:creator>Dan Miller</dc:creator>
				<category><![CDATA[CAT Scans]]></category>
		<category><![CDATA[AT&T]]></category>
		<category><![CDATA[mobile services]]></category>
		<category><![CDATA[Recombinant Communications]]></category>
		<category><![CDATA[speech processing]]></category>

		<guid isPermaLink="false">http://opusresearch.net/wordpress/?p=2517</guid>
		<description><![CDATA[A briefing with the principals at ZoomSafer inspired me to think, once again, about the important, yet marginal, role that speech processing technologies have to play in making for safer motoring.]]></description>
			<content:encoded><![CDATA[<p><a href="http://opusresearch.net/wordpress/wp-content/uploads/2010/03/zoomsaferlogo.jpg"><img src="http://opusresearch.net/wordpress/wp-content/uploads/2010/03/zoomsaferlogo-150x150.jpg" alt="" title="zoomsaferlogo" width="150" height="150" class="alignright size-thumbnail wp-image-2521" /></a>A briefing with the principals at <a href="http://www.zoomsafer.com/">ZoomSafer</a> inspired me to think, once again, about the important, yet supplementary, role that speech processing technologies have to play in making for safer motoring. With the CTIA (Cellular Telephone and Internet) Conference on the near horizon, the coverage in the general media is predictably destined to recite the litany of statistics about accidents and loss of life caused by &#8220;distracted drivers.&#8221; </p>
<p>AT&#038;T Mobility is doing its part to cast a sharp light on the problem. It has launched a nationwide campaign of public service announcements desigend &#8220;to raise awareness about the risks of texting and driving and remind all wireless consumers, especially youth, that text messages can – and should – wait until after driving.&#8221; Advertising initiatives are largely ineffective, unless accompanied by some other form of restraint or constraint. A White Paper published by ZoomSafer notes that, at any moment in time, over 810,000 autos are being driven by people who are actively using their cellular phone.  This is the sad case, in spite of the fact that texting while driving is banned in a total of 21 states or territories. </p>
<p>ZoomSafer is a solution provider that has developed and markets software that enables its users (both corporate and personal) to define and manage policies that govern the use of mobile devices or, as CEO and Founder Matt Howard put it, &#8220;promote safe and legal use of cell phones while driving.&#8221; The solution is comprised of three parts. A Web site enables users to identify the policies that they wish to enforce (for example, to prohibit reception or origination of text messages or phone calls when the device is moving faster than 10 mph). Client software on the handset detects speed and &#8220;enforces&#8221; the designated policies. Finally, and this is the &#8220;speechable moment&#8221; aspect of the solution, ZoomSafer and Irish voice application service provider Dial2Do offer a service called &#8220;Voice Mate&#8221;, provides single-button control of TTS-based reading of emails or texts as well as dictation of replies, email or texts. </p>
<p>At the the theme of AT&#038;T&#8217;s national campaign is &#8220;No text is worth dying for,&#8221; and its tagline is &#8220;“Txtng &#038; Drivng &#8230; It Can Wait.” The carrier also uses this Facebook page to encourage users to <a href="http://www.facebook.com/ATT#!/ATT?v=app_10531514314">take the pledge</a> not to text while driving. </p>
<p>I see ZoomSafer picking up where such pledges leave off. The company sees three distinct market segments: Teens (or rather their parents), &#8220;pro-sumers&#8221; (meaning mobile professionals)  and corporations. For $2.99 each month, it gives subscribers the ability to define and enforce their own policies against distracted driving. The addition of Voice Mate brings the monthly rate to $5.99. In addition, $10 per handset per month is the charge for Corporate customers to manage, enforce and audit their policies.</p>
<p>&#8220;Policy Enforcement&#8221;, meaning keeping people true to their stated intentions, is the crux of ZoomSafer&#8217;s value proposition. The economic benefit arises from loss reduction, lawsuit avoidance and abidance to existing laws. However, for those to whom communications deferred is communications denied, the delivery of voice renderings of text and the spoken origination of email or texts will turn out to be a bargain at an incremental $3 per month. Combining speech-enabled services with broader service offerings is destined to be the norm.</p>
]]></content:encoded>
			<wfw:commentRss>http://opusresearch.net/wordpress/2010/03/10/safe-driving-another-speechable-moment/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Captions on YouTube? Just Another Speechable Moment</title>
		<link>http://opusresearch.net/wordpress/2010/03/05/captions-on-youtube-ho-hum-just-another-speechable-moment/</link>
		<comments>http://opusresearch.net/wordpress/2010/03/05/captions-on-youtube-ho-hum-just-another-speechable-moment/#comments</comments>
		<pubDate>Fri, 05 Mar 2010 18:40:29 +0000</pubDate>
		<dc:creator>Dan Miller</dc:creator>
				<category><![CDATA[CAT Scans]]></category>
		<category><![CDATA[Google Voice]]></category>
		<category><![CDATA[Recombinant Communications]]></category>
		<category><![CDATA[speech processing]]></category>
		<category><![CDATA[YouTube]]></category>

		<guid isPermaLink="false">http://opusresearch.net/wordpress/?p=2492</guid>
		<description><![CDATA[YouTube (a Google property) formally launched a service that automatically transcribes audio track of videos on YouTube and displays them as captions for those who choose the option from the "Closed Caption" menu. ]]></description>
			<content:encoded><![CDATA[<p><a href="http://opusresearch.net/wordpress/wp-content/uploads/2010/03/YouTube_logo.png"><img src="http://opusresearch.net/wordpress/wp-content/uploads/2010/03/YouTube_logo.png" alt="" title="YouTube_logo" width="133" height="71" class="alignright size-full wp-image-2498" /></a>Yesterday, as noted in this <a href="http://youtube-global.blogspot.com/2010/03/future-will-be-captioned-improving.html">blog post</a>, YouTube (a Google property) formally launched a service that automatically transcribes audio track of videos and displays them as captions for those who choose the option from the &#8220;Closed Caption&#8221; menu. The service was actually introduced in November 2009 and, as demonstrated in the video below, it uses the same transcription and translation resources that are embedded in Google Voice. </p>
<p><object width="600" height="360"><param name="movie" value="http://www.youtube.com/v/B6jXPpqVPVI&#038;hl=en_US&#038;fs=1&#038;"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/B6jXPpqVPVI&#038;hl=en_US&#038;fs=1&#038;" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="600" height="360"></embed></object></p>
<p>As the the video&#8217;s narrator admits, sometimes the transcriptions are not so accurate but, in certain cases, &#8220;they are still better than nothing.&#8221; That, in a nutshell, captures the notion of &#8220;satisficing&#8221; which I discussed in <a href="http://opusresearch.net/wordpress/2010/02/10/googles-approach-to-real-time-translation-a-matter-of-satisficing/">this blog post</a>. At this point in the technology&#8217;s development, it&#8217;s important to note when &#8220;good enough&#8221; is good enough.</p>
<p>Yet that hasn&#8217;t stopped a significant number of industry luminaries from declaring the service a &#8220;#failure&#8221;. For instance, the video embedded in <a href="http://newteevee.com/2010/03/05/youtube-caption-fail-jkontheruns-secret-fbi-edition/">this article</a> by  Janko Roettgers at GigoOm&#8217;s jkOntheRunfrom showcases what he calls &#8220;auto-captioning gone wrong&#8221;.</p>
<p>You can detect the pattern here. Google makes public a feature that has been percolating within the confines of its cloud for a number of years. It shows up as &#8220;beta&#8221; or a product of its &#8220;labs&#8221; or simply as a button that can be invoked in one of its highly-trafficked properties &#8211; like Gmail or Google Apps. Early reviews are a mixture of delight, shock, awe and ridicule. All feedback is encouraged and ultimately employed to refine and adapt the service for general consumption&#8230; or relegate it back to cloud-based oblivion.</p>
<p>I see auto-captioning, as well as translation and timing, as yet another &#8220;speechable moment,&#8221; meaning that it is an instance where the resources employed for a new set of core services, like speech recognition for the purpose of transcription or translation, are deployed as part of a broader set of services. I coined the term while discussing enhancements to Vlingo&#8217;s iPhone app in <a href="http://www.internet2go.net/news/mobile-platforms/vlingo-adds-speech-enabled-e-mail-and-sms-iphone-more-speechable-moments">this post</a> on Internet2Go.net. </p>
<p>Even though I don&#8217;t subscribe to the belief that &#8220;all publicity is good publicity&#8221;, I do believe that exposing the public to both the good and bad instances of transcription and translation is an important part of setting realistic expectations for the technology. That provides prospective users with the power to decide how they want to use (or &#8220;game&#8221;) the service and determine whether it is &#8220;good enough&#8221; for them.</p>
]]></content:encoded>
			<wfw:commentRss>http://opusresearch.net/wordpress/2010/03/05/captions-on-youtube-ho-hum-just-another-speechable-moment/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google&#8217;s Approach to Real-time Translation: A Matter of &#8220;Satisficing&#8221;</title>
		<link>http://opusresearch.net/wordpress/2010/02/10/googles-approach-to-real-time-translation-a-matter-of-satisficing/</link>
		<comments>http://opusresearch.net/wordpress/2010/02/10/googles-approach-to-real-time-translation-a-matter-of-satisficing/#comments</comments>
		<pubDate>Wed, 10 Feb 2010 16:39:14 +0000</pubDate>
		<dc:creator>Dan Miller</dc:creator>
				<category><![CDATA[CAT Scans]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Mobile Translation]]></category>
		<category><![CDATA[Recombinant Communications]]></category>
		<category><![CDATA[speech processing]]></category>

		<guid isPermaLink="false">http://opusresearch.net/wordpress/?p=2354</guid>
		<description><![CDATA["Satisficing" has long been the unspoken business imperative of the speech processing community.]]></description>
			<content:encoded><![CDATA[<p><img src="http://opusresearch.net/wordpress/wp-content/uploads/2009/11/Google_logo.jpg" alt="Google_logo" title="Google_logo" width="150" height="59" class="alignright size-full wp-image-1943" />&#8220;Satisficing&#8221; has long been the unspoken business imperative of the speech processing community. Be it speech recognition, speaker recognition, speaker identification or (most recently) &#8220;real-time, speech-to-speech translation&#8221; this term, which combines &#8220;satisfy&#8221; with &#8220;suffice&#8221; captures the spirit and strategy of speech-based product development and delivery. Google&#8217;s &#8220;head of translation services&#8221; Frank Och caused a stir when he was quoted in <a href="http://technology.timesonline.co.uk/tol/news/tech_and_web/personal_tech/article7017831.ece">this article</a> in News International Group&#8217;s TimesOnline as saying, &#8220;We think speech-to-speech translation should be possible and work reasonably well in a few years’ time.”</p>
<p>Indeed, Google has provided <a href="http://translate.google.com/?hl=en#">this resource</a> for real-time translation of text for more than three years. It now supports over 40 languages. Yet there is not a machine-to-machine language expert who believes that such a service will ever be &#8220;100% accurate.&#8221; A consensus among those who left comments on the TimesOnline site believe that accuracy is still in the 50% range. </p>
<p>I, personally, believe that measuring the accuracy of a speech-to-speech translation resource is not a meaningful measure. Even if a system were able to recognize 9 out of 10 words dictated into a system, the one word that is misrecognized can often distort the meaning of the entire phrase. The problem can be compounded when that initial transcription is translated into another language for re-rendering through a text-to-speech engine.</p>
<p>That said, Google&#8217;s &#8220;can-do&#8221; attitude toward real time translation is laudable. It has access to an ever-growing database of multi-lingual search terms and search results. It is now adding spoken search and dictation terms emanating from the Google Mobile App on a multiplicity of smartphones. Based on an evaluation of simultaneous improvements in machine-aided speech recognition, transcription and translation, one can see why Google&#8217; Ochs has had his confidence raised.</p>
<p>My point is that all such improvements are asymptotic. They approach 100% accuracy, but they will never get there. This is why the concept of &#8220;satisficing&#8221; is growing in importance. Google has taken its time-tested approach both to the underlying technological challenges and to the roll-out of new services. It&#8217;s technological approach is pure statistics. It captures, stores and processes a huge amount of utterances. It does it over-and-over again. It may not get them all right, but the result is constant improvement and, starting about four months ago, was deemed &#8220;satisfactory&#8221;. </p>
<p>As for &#8220;sufficient&#8221;, that is the end-users&#8217; call, and the Google&#8217;s roll-out strategy, which often confuses people about whether a service is &#8220;in the lab&#8221;, &#8220;in beta&#8221; or &#8220;generally available&#8221; is better understood as a test of sufficiency. Google knows a service is sufficient when it&#8217;s activity logs show that people are using it. That is satisficing in action. It&#8217;s not optimal, but it is effective. Google is, in effect, doing the market conditioning and expectation setting for solutions providers that already includes IBM, Nuance, Cisco, Loquendo and other speech processing specialists. But, given that it is a network service, it is also staking out new ground potentially for incumbent network operators to avoid becoming &#8220;fat-dumb pipes&#8221; and for cloud computing specialists to expand their global reach.</p>
]]></content:encoded>
			<wfw:commentRss>http://opusresearch.net/wordpress/2010/02/10/googles-approach-to-real-time-translation-a-matter-of-satisficing/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Microsoft&#8217;s Speech Czar Describes the &#8220;Speech At Microsoft&#8221; Group</title>
		<link>http://opusresearch.net/wordpress/2010/01/08/microsofts-speech-czar-describes-the-speech-at-microsoft-group/</link>
		<comments>http://opusresearch.net/wordpress/2010/01/08/microsofts-speech-czar-describes-the-speech-at-microsoft-group/#comments</comments>
		<pubDate>Fri, 08 Jan 2010 20:39:12 +0000</pubDate>
		<dc:creator>Dan Miller</dc:creator>
				<category><![CDATA[CAT Scans]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[speech processing]]></category>
		<category><![CDATA[Tellme]]></category>

		<guid isPermaLink="false">http://opusresearch.net/wordpress/?p=2202</guid>
		<description><![CDATA[Zig Sarafin describes Microsoft's many initiatives around speech and the Natural User Interface (NUI).]]></description>
			<content:encoded><![CDATA[<p><img src="http://opusresearch.net/wordpress/wp-content/uploads/2010/01/Microsoft_logo-150x150.jpg" alt="Microsoft_logo" title="Microsoft_logo" width="150" height="150" class="alignright size-thumbnail wp-image-2206" />In <a href="http://www.microsoft.com/presspass/events/ces/VideoGallery.aspx?contentID=feature_zignui">this wide-ranging video conversation</a>, Zig Sarafin describes Microsoft&#8217;s many initiatives around speech and the Natural User Interface (NUI). He also describes where Tellme &#8220;fits&#8221; into the efforts to support accurate recognition and response to spoken input into services like Bing, the Microsoft &#8220;Search Cloud&#8221; and in automobiles. Definitely worth the 15 minute runtime (Silverlight required, natch!)</p>
]]></content:encoded>
			<wfw:commentRss>http://opusresearch.net/wordpress/2010/01/08/microsofts-speech-czar-describes-the-speech-at-microsoft-group/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google&#8217;s Nexus One Launches With Voice Enabled Everything</title>
		<link>http://opusresearch.net/wordpress/2010/01/06/googles-nexus-one-launches-with-voice-enabled-everything/</link>
		<comments>http://opusresearch.net/wordpress/2010/01/06/googles-nexus-one-launches-with-voice-enabled-everything/#comments</comments>
		<pubDate>Wed, 06 Jan 2010 16:51:25 +0000</pubDate>
		<dc:creator>Dan Miller</dc:creator>
				<category><![CDATA[CAT Scans]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[mobile speech]]></category>
		<category><![CDATA[speech processing]]></category>

		<guid isPermaLink="false">http://opusresearch.net/wordpress/?p=2172</guid>
		<description><![CDATA[Not surprisingly (since it is first-and-foremost a phone), integration of voice command and dictation was one of the major areas of innovation for the new Nexus One from Google. ]]></description>
			<content:encoded><![CDATA[<p><img src="http://opusresearch.net/wordpress/wp-content/uploads/2010/01/Su7W8GBKPtUJ.jpeg" alt="Su7W8GBKPtUJ" title="Su7W8GBKPtUJ" width="80" height="49" class="alignright size-full wp-image-2177" />Not surprisingly (since it is first-and-foremost a phone), integration of voice command and dictation was one of the major areas of innovation for the new Nexus One from Google. While I wasn&#8217;t one of the privileged few invited to the press/analyst event that launched the phone (full disclosure &#8211; I was not provided with a free demo unit), I was able to attend through the magic of modern streaming. A replay of the entire event is available <a href="http://www.ustream.tv/recorded/3763271">here</a>, but I&#8217;m embedding a snippet from CNET News to make my point. Advance to minute 2:38 and you&#8217;ll see how Nexus One product manager Erick Tseng highlights the advantages that arise from Google taking active control over development of hardware in parallel with its software development efforts:</p>
<p><object type="application/x-shockwave-flash" data="http://image.com.com/gamespot/images/cne_flash/production/media_player/proteus/one/proteus2.swf" width="432" height="362"><param name="FlashVars" value="playerMode=embedded&#038;allowFullScreen=1&#038;flavor=EmbeddedPlayerVersion&#038;showOptions=0&#038;skin=http://image.com.com/gamespot/images/cne_flash/production/media_player/proteus/one/skins/proteus-zdnet.png&#038;autoPlay=false&#038;movieAspect=4.3&#038;embeddingAllowed=true&#038;clockColor=0x3b3b3b&#038;paramsURI=http%3A%2F%2Fnews.zdnet.com%2F2461-1_22-378541.xml%3Fwidth%3D432%26height%3D362%26ptype%3D6475%26mode%3Dembedded%26autoplay%3Dfalse%26siteId%3D24%26site%3D%26ttag%3DSam%2BDiaz%26assetId%3D153523%26conttypid%3D26%26nc%3D1262790570082%26nodeId%3D10532" /><param name="movie" value="http://image.com.com/gamespot/images/cne_flash/production/media_player/proteus/one/proteus2.swf" /><param name="wmode" value="transparent" /><param name="allowScriptAccess" value="always"></param></object></p>
<p>As we&#8217;ve pointed out for years now, the combination of voice command and dictation on smartphones is powerful. On the Nexus One, Google has done the best job yet of seamless search and navigation. Pressing the microphone button and saying &#8220;Navigate to Ikea&#8221; is all you need to do to (a) invoke a search of Google to find the nearest Ikea store and then (b) initiate the routine on Google Maps which ultimately leads to rendering the route from your current position to the nearest store. That&#8217;s why I&#8217;m calling it seamless.</p>
<p>Also note that invoking speech recognition is not yet completely &#8220;hands-free and eyes-forward&#8221; as will be required for the ultimate mobile speech experience. But that challenge can be overcome in a car with a simple button on the steering column (a la Ford SYNC). The other major challenge is, and will always be, accuracy. When demonstrating the entry of a dictated email message, there was still an element of suspense as Tseng, under ideal conditions, waited those few seconds to see if the dictation is rendered accurately. In this case, it was. And the crowd applauded accordingly. </p>
<p>Personally, I would prefer to have a demo where a proper name is misspelled or the message requires a change in punctuation (from a period to an exclamation point, for example). To me, that first rendering is, more likely than not, going to contain a misspelling or two and require some tweaking of the punctuation. Knowing the quality of the speech scientists working at Google these days, I&#8217;m pretty sure that they have many tricks to promote accurate rendering, punctuation and even the insertion of emoticons, but I suspect they will be introducing those after they build the ground level of acceptance that this modest offering is bound to engender. </p>
<p>Today I&#8217;ll sit back and watch as mobile speech benefits from the ripple effect from Google&#8217;s mass marketing efforts. Google is doing a lot of things right with Nexus One, including the opening up of marketing and distribution efforts to include direct sale of phones through its newly created channel (Google login required). From a personal economic point of view, the difference between the $100+ and a $500+ price tag speaks for itself most price conscious shoppers will opt for the subsidized phone and a term plan with T-Mobile. Regardless of customer choice, the major step forward from my point of view is the bold &#8220;voice enabled&#8221; stamp that Nexus One puts on its mobile experience.</p>
<p>I&#8217;ll be writing a full discussion of the competitive implications across the mobile speech ecosystem in a forthcoming advisory. Suffice it to say that the tight integration of software, hardware, operating system and voice processing that is exemplified by both the iPhone and Nexus One leaves creates challenges for dozens of independent providers of speech-enabled mobile apps. Just as the automobile mount for the Droid with speech enabled navigation has struck fear in the hearts of all personal navigational device (PND) makers, the showcasing of voice control, search, navigation and dictation by Google will drive alternatives to define specific areas of opportunity (niches) and new partnerships that extend both their reach and life expectancies.</p>
]]></content:encoded>
			<wfw:commentRss>http://opusresearch.net/wordpress/2010/01/06/googles-nexus-one-launches-with-voice-enabled-everything/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>There&#8217;s a Ford in Mobile Speech&#8217;s Future</title>
		<link>http://opusresearch.net/wordpress/2009/09/24/theres-a-ford-in-mobile-speechs-future/</link>
		<comments>http://opusresearch.net/wordpress/2009/09/24/theres-a-ford-in-mobile-speechs-future/#comments</comments>
		<pubDate>Thu, 24 Sep 2009 20:39:16 +0000</pubDate>
		<dc:creator>Dan Miller</dc:creator>
				<category><![CDATA[CAT Scans]]></category>
		<category><![CDATA[Ford]]></category>
		<category><![CDATA[in vehicle]]></category>
		<category><![CDATA[mobile speech]]></category>
		<category><![CDATA[speech processing]]></category>

		<guid isPermaLink="false">http://opusresearch.net/wordpress/?p=1521</guid>
		<description><![CDATA[Ford Motor Company plans to incorporate a text reader as an option in its passenger cars is a precursor to more in-vehicle, speech-based activity.]]></description>
			<content:encoded><![CDATA[<p>This <a href="ww.bloomberg.com/apps/news?pid=20601103&#038;sid=aBy1Yrmz819M">post</a> on Ford Motor Company&#8217;s plans to incorporate a text reader as an option in its passenger cars is a precursor to more in-vehicle, speech-based activity. Call it the &#8220;Knight Rider&#8221; effect. Voice-activated cars have captured the public&#8217;s imagination for decades. Unfortunately automobile manufacturers rank somewhere behind wireless carriers in their willingness to incorporate a quality voice user interface into their mass market offerings. </p>
<p>Ford has first-hand knowledge of drivers&#8217; interest in voice-activated services thanks to its experience with Sync (which was introduced in late 2007). Now its chairman will be a keynoter at the upcoming Consumer Electronics Show (CES) and, according the report in Bloomberg, according to Jim Buczkowski, Ford’s director of electronics, the company already &#8220;offers a message system on &#8216;a handful&#8217; of phones that reads texts to drivers who push a button on the dash.</p>
<p>This hybrid approach, using a button on the steering column to invoke services on a wireless phone, circumnavigates the &#8220;battle for the button&#8221; that many wireless voice application providers are perparing for as they enter the battle for mobile market share. For safety reasons alone, the quest for &#8220;truly hands free&#8221; mobile communications will make this type of service initiation vital. In-vehicle applications include news reading, message origination and transcription and navigation &#8211; all ripe for ready adoption by commuters.</p>
]]></content:encoded>
			<wfw:commentRss>http://opusresearch.net/wordpress/2009/09/24/theres-a-ford-in-mobile-speechs-future/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Nuance Makes Inroads Into Vonage and Verizon</title>
		<link>http://opusresearch.net/wordpress/2009/09/08/nuance-makes-inroads-into-vonage-and-verizon/</link>
		<comments>http://opusresearch.net/wordpress/2009/09/08/nuance-makes-inroads-into-vonage-and-verizon/#comments</comments>
		<pubDate>Tue, 08 Sep 2009 17:27:42 +0000</pubDate>
		<dc:creator>Dan Miller</dc:creator>
				<category><![CDATA[CAT Scans]]></category>
		<category><![CDATA[Nuance]]></category>
		<category><![CDATA[Samsung]]></category>
		<category><![CDATA[speech processing]]></category>
		<category><![CDATA[Verizon]]></category>
		<category><![CDATA[Visual Voicemail]]></category>
		<category><![CDATA[Vonage]]></category>

		<guid isPermaLink="false">http://opusresearch.net/wordpress/?p=1397</guid>
		<description><![CDATA[Nuance announced two agreements that will broaden the reach of speech processing to customers of Vonage and Verizon Wireless (on the Samsung Rogue)]]></description>
			<content:encoded><![CDATA[<p><img src="http://opusresearch.net/wordpress/wp-content/uploads/2009/04/nuance_logo.jpg" alt="nuance_logo" title="nuance_logo" width="117" height="75" class="alignright size-full wp-image-356" />Nuance&#8217;s marketing and product staff must have kept busy over the Labor Day weekend. The company made two announcements that hit the wire with the dawn&#8217;s early light today. This <a href="http://www.reuters.com/article/pressRelease/idUS139099+08-Sep-2009+BW20090908">press release </a>involves the incorporation of the Nuance flavor of voicemail-to-text transcription as part of a newly announced &#8220;World Plan&#8221; offered by commercial VoIP pioneer, Vonage. In a separate <a href="http://www.reuters.com/article/pressRelease/idUS156736+08-Sep-2009+BW20090908">announcement</a>, the company noted that, when the Samsung Rogue rolls out through Verizon Wireless, it will ship with pre-installed software that promotes dictation (of text messages) as well as command and control of Web browsing and many other features and functions of the high-end smartphone. [I will discuss at greater length both on the Internet2Go site, as well as in our soon-to-be-released research report on "Mobile Speech".]</p>
<p>Vonage, and its charismatic CEO/Founder Jeff Citron, had always relied on innovation to differentiate itself from other carriers. It introduced a voicemail-to-text transcription service in 2007 ago under the name &#8220;Vonage Text&#8221;, reportedly turning to Simulscribe (now called PhoneTag) as a service provider. At the time it charged $0.25 for each transcribed message. The deal clearly was not an &#8220;exclusive&#8221;; one of PhoneTag&#8217;s long-time differentiators was its claim of being carrier-independent. Indeed, earlier this year in a press release, SpinVox claimed to be &#8220;live&#8221; with the following carriers around the world: Alltel, Cincinnati Bell, Sasktel, Rogers Wireless, Telus, Telstra, Vodacom South Africa, Vodafone Spain, Movistar Chile, Skype, Vonage and Livejournal.</p>
<p>All the while, Vonage must have been trialing Nuance&#8217;s voicemail-to-text service for introduction as the &#8220;Visual Voicemail&#8221; element of its $24.99 montly flat rate &#8220;Vonage World&#8221; service plan. Vonage subscribers without the Vonage World plan can have voicemail transcribed for the going rate of $0.25 per message. There is a lot of wisdom in offering voicemail transcription as part of a flat-rate plan. I&#8217;ve been trialing the Nuance voicemail-to-text for about a year now (along with several others). In general the service is very good and accurate enough, but there are always utterances that are out-of-grammar or not rendered correctly. This is true of Google Voice, SpinVox, PhoneTag and even high cost transcription services. We all need editors.</p>
<p>Offering the service at no extra cost, as part of a premium VoIP service plan takes away some of the pricing friction and high expectations that accompany a premium service.  It is definitely time for the services out to a broader population so they can get a feeling for when it works well and when it needs work. This is part of a socialization process that every new technology should go through. Get it into the hands of more people so that they can decide whether they will use it and then define the &#8220;when&#8221; and &#8220;how&#8221; they will use it.</p>
]]></content:encoded>
			<wfw:commentRss>http://opusresearch.net/wordpress/2009/09/08/nuance-makes-inroads-into-vonage-and-verizon/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
