Sensory’s CEO Mozer On the SCIDs (Speech-Controlled Internet Devices)
We completed a very interesting phone briefing with Todd Mozer, CEO of Sensory Inc., a company that specializes in affordable, embedded speech processing technologies. He provided a preview of a new chip that will be generally available later this year, so news surrounding it is still under embargo. Yet, in the course of his company presentation, we began discussing the prospects surrounding SCID’s, “Speech-Controlled Internet Device,” a broad category that spans household appliances, consumer electronics, automotive products and just about any personal, electronic device that could benefit from well-tuned mechanisms for voice-activation, instruction and content creation.
Mozer noted that Pat Gelsinger, the sr. vice president at Intel who manages its largest operating group had noted this forecast in a recent speech: “By 2015 there will be over 15 Billion devices connected to the Internet”. Gelsinger was referring to networked toasters, ovens, refrigerators, clocks, home entertainment units, lamps and sensors that are capable of sending and retrieving information from servers anywhere on the Internet. Most importantly, from Mozers point of view, a significant number of these devices are destined to be speech-enabled.
As users of speech-enabled navigational devices well-know, the combination of voice-input and data connectivity is powerful. Moser provided us with a demonstration of the Moshi Alarm Clock, an unconnected precursor to a SCID. After using the magic words “Hello Moshi” to wake up the device, spoken commands can be used to set the clock, set the alarm, ask for a spoken readout of the time or the temperature.
Connecting the Moshi Clock to the internet through WiFi, Bluetooth or WiMax (in some areas) would make it possible for Moshi to transform itself into a World Clock or provide temperatures from around the world. But there are lots of other use cases for SCIDS and Sensory is playing an important role by providing relatively inexpensive chips to handle so-called natural-language input. One of the immediate opportunities is arises from the need for “truly hands-free” ways to enter commands or dictate content to mobile devices.
In a soon-to-be-released report, I’ll be updating our survey and trends analysis of speech-enabled mobile services. Software to support natural-feeling voice input, especially for music playback and search, is getting much more affordable. Text-to-speech rendering is getting very good and very affordable. Speech-to-text transcription remains a bit elusive, but the ability to interpret “natural language” commands – within understood contexts and use cases – is getting very good. You can follow our coverage of developments here and you may find Todd Mozer’s posts in the Sensory Inc. blog quite interesting as well.