CES 2017 was permeated by voice applications, and rightfully so. The “smart home” is taking shape. Electronics manufacturers showcased connected products that make people’s lives easier and more efficient by tapping into data sources and making smart decisions. As more of these smart devices, sometimes referred to as nodes, appear in people’s homes, the need to be able to control them with voice increases.
While at CES, I was struck both by the great progress voice user interfaces have achieved over the past year and, at the same time, by the immaturity and clunkiness of the showcased voice solutions. The landscape is also starting to show winners and losers. For example, in an emerging market where players both large and small are jockeying for position in the marketplace for voice, Amazon emerged as the clear early leader. A recent article on CNET lists the products showcased at CES this year that have integrated with Amazon’s Alexa Voice Service.
I had an opportunity to speak with representatives of several brands about their specific use cases for Alexa capabilities. One discussion that stands out included a demo of Volkswagen’s new Alexa-enabled service. The car company has developed an Alexa skill that enables a car owner to get useful services by invoking the Alexa assistant from any Alexa-enabled device, including from within the vehicle. Drivers can ask Alexa for the status of their car, can order products for their specific car model, and can even ask Alexa for reminders that are sent out when the driver is at specific locations or points of interest.
Having these services available within the vehicle, as well as from other devices outside the vehicle, offers great convenience to the driver. At the same time, the reality of the service shows that there are still challenges in the implementation. To invoke any service, the driver needs to say: “Alexa, ask Volkswagen to,” followed by the phrase that invokes a specific service. For example, to set a location-based reminder, the full voice command is: “Alexa, ask Volkswagen to set a reminder to buy flowers when I leave work.” This approach will always be a challenge to dialog designers because dialogs are between two parties and this automatically adds a third.
The invocation phrase is cumbersome, but that’s not the only problem. I was invited into a virtually soundproof booth, where a Volkswagen representative sat a foot or two away from a connected Amazon Echo device. Even in that quiet environment, Alexa misheard at least a 30% of what the representative was saying. Due to the length of the invocation, for example, the assistant would sometimes stop listening after a couple of seconds.
During the conference I also saw instances where the Alexa Voice Service was implemented for products that didn’t seem to me to be a good fit for the Amazon solution. There were several consumer robot products designed for household use that operated using Alexa. In order to get the robot to play a game, tell a story, or carry out a command, the user needs to talk to the robot and say: “Robot, ask Alexa to start the brand’s [or company name’s] skill that performs this action.” Of course the user would be using the actual company or skill name and the actual name of the action desired. But instead of talking directly to the robot, the robot is essentially invoking an Alexa skill. If the skill involves text-to-speech output, the robot will be speaking with Alexa’s voice.
While voice technologies have made significant advances, there remains a concern that we may be setting consumers up for disappointment. We haven’t yet attained the goal of providing voice products that always understand us. We also haven’t yet built the robot that the consumer can talk to directly and interact with reliably. Nor have we solved the large challenge of helping consumers to easily discover all the voice services that are on their device.
While CES 2017 was an exciting event that showed how far voice solutions have come, it’s important that we don’t lose sight of the significant challenges that remain ahead.
Categories: Conversational Intelligence, Intelligent Assistants