Google’s Approach to Real-time Translation: A Matter of “Satisficing”

Google_logo“Satisficing” has long been the unspoken business imperative of the speech processing community. Be it speech recognition, speaker recognition, speaker identification or (most recently) “real-time, speech-to-speech translation” this term, which combines “satisfy” with “suffice” captures the spirit and strategy of speech-based product development and delivery. Google’s “head of translation services” Frank Och caused a stir when he was quoted in this article in News International Group’s TimesOnline as saying, “We think speech-to-speech translation should be possible and work reasonably well in a few years’ time.”

Indeed, Google has provided this resource for real-time translation of text for more than three years. It now supports over 40 languages. Yet there is not a machine-to-machine language expert who believes that such a service will ever be “100% accurate.” A consensus among those who left comments on the TimesOnline site believe that accuracy is still in the 50% range.

I, personally, believe that measuring the accuracy of a speech-to-speech translation resource is not a meaningful measure. Even if a system were able to recognize 9 out of 10 words dictated into a system, the one word that is misrecognized can often distort the meaning of the entire phrase. The problem can be compounded when that initial transcription is translated into another language for re-rendering through a text-to-speech engine.

That said, Google’s “can-do” attitude toward real time translation is laudable. It has access to an ever-growing database of multi-lingual search terms and search results. It is now adding spoken search and dictation terms emanating from the Google Mobile App on a multiplicity of smartphones. Based on an evaluation of simultaneous improvements in machine-aided speech recognition, transcription and translation, one can see why Google’ Ochs has had his confidence raised.

My point is that all such improvements are asymptotic. They approach 100% accuracy, but they will never get there. This is why the concept of “satisficing” is growing in importance. Google has taken its time-tested approach both to the underlying technological challenges and to the roll-out of new services. It’s technological approach is pure statistics. It captures, stores and processes a huge amount of utterances. It does it over-and-over again. It may not get them all right, but the result is constant improvement and, starting about four months ago, was deemed “satisfactory”.

As for “sufficient”, that is the end-users’ call, and the Google’s roll-out strategy, which often confuses people about whether a service is “in the lab”, “in beta” or “generally available” is better understood as a test of sufficiency. Google knows a service is sufficient when it’s activity logs show that people are using it. That is satisficing in action. It’s not optimal, but it is effective. Google is, in effect, doing the market conditioning and expectation setting for solutions providers that already includes IBM, Nuance, Cisco, Loquendo and other speech processing specialists. But, given that it is a network service, it is also staking out new ground potentially for incumbent network operators to avoid becoming “fat-dumb pipes” and for cloud computing specialists to expand their global reach.



Categories: Articles

Tags: , , ,