Captions on YouTube? Just Another Speechable Moment

By Dan Miller on March 5, 2010

Yesterday, as noted in this blog post, YouTube (a Google property) formally launched a service that automatically transcribes audio track of videos and displays them as captions for those who choose the option from the “Closed Caption” menu. The service was actually introduced in November 2009 and, as demonstrated in the video below, it uses the same transcription and translation resources that are embedded in Google Voice.

As the the video’s narrator admits, sometimes the transcriptions are not so accurate but, in certain cases, “they are still better than nothing.” That, in a nutshell, captures the notion of “satisficing” which I discussed in this blog post. At this point in the technology’s development, it’s important to note when “good enough” is good enough.

Yet that hasn’t stopped a significant number of industry luminaries from declaring the service a “#failure”. For instance, the video embedded in this article by Janko Roettgers at GigoOm’s jkOntheRunfrom showcases what he calls “auto-captioning gone wrong”.

You can detect the pattern here. Google makes public a feature that has been percolating within the confines of its cloud for a number of years. It shows up as “beta” or a product of its “labs” or simply as a button that can be invoked in one of its highly-trafficked properties – like Gmail or Google Apps. Early reviews are a mixture of delight, shock, awe and ridicule. All feedback is encouraged and ultimately employed to refine and adapt the service for general consumption… or relegate it back to cloud-based oblivion.

I see auto-captioning, as well as translation and timing, as yet another “speechable moment,” meaning that it is an instance where the resources employed for a new set of core services, like speech recognition for the purpose of transcription or translation, are deployed as part of a broader set of services. I coined the term while discussing enhancements to Vlingo’s iPhone app in this post on Internet2Go.net.

Even though I don’t subscribe to the belief that “all publicity is good publicity”, I do believe that exposing the public to both the good and bad instances of transcription and translation is an important part of setting realistic expectations for the technology. That provides prospective users with the power to decide how they want to use (or “game”) the service and determine whether it is “good enough” for them.

‹ Thoughts on Orange’s Curious Choice of MeeGo

Japan’s Largest Wireless Carrier Provides OpenID Authentication to Half the Adult Population ›

Categories: Articles