And Then There Were Two! How Actions on Google can Cooperate with Amazon’s Lex

[Editor’s note: We welcome this contribution from Ahmed Bouzid, Co-Founder and CEO of Witlingo, provider of a platform-independent service creation and deployment environment for voice services.]

Yesterday, December 8, 2016, was as much of an important event in the unfolding story of the rise Ubiquitous Voice as was November 6, 2014 when Amazon launched the Amazon Echo.  Yesterday, Google officially made Actions on Google generally available to developers who want to build what are called Actions for the Google Home Assistant. If you are familiar with Alexa and the Amazon Echo, an Action is to Google Assistant what a Skill is to Alexa: third party designed, developed, hosted and maintained capabilities that are accessible to customers who own a Google Home or an Alexa Device.

At first blush, you may very well think that this is nothing more than an additive increment: So now we have yet another Echo, but this time it’s from Google. So what — what’s the big deal? And surely, tomorrow Apple, and Samsung, and Facebook, and (let’s all hope) Microsoft will come out with their own far field, eyes free/hands free stand alone device. So what’s the big deal?

But it is a big deal. As Amazon’s CEO Jeff Bezos himself pointed out, expect the home and beyond to be populated by more than one type of these devices. So you have your Amazon Echo, where you may have your music and where you may do a lot of your shopping, and listen to your audiobooks via audible, but what about all the the stuff that Google can help you with but that Amazon can’t? Like, if you have an Android based phone, you could ask your Google Home questions like, “How far have I walked today,” or “Is my flight to San Diego on time?” or “Did I get a new email from Samantha?” etc. (And when Apple gets its act together and releases Siri from its current jail and onto the form factor where it belongs, similar advantages will be leveraged.)

The big deal is this: you have an Echo in the Kitchen and a Google Home in the bedroom. To both you speak the same language — a human language — and from both you expect more or less the same experience. As you go about your life, you really don’t want to be forced to remember what you can do on one device but not the other, let alone how you say something in one device vs. another device.  Unlike having to put up with two devices — an iPhone say and a Samsung device, where at least you have some visual affordance to help you know what you and can’t do with one versus the other — with voice, the imperative is fundamental that the burden of difference, no matter how slight one may think it is, is removed.

Which creates a tremendous opportunity for experience brokers to enter the field and mediate between the various conversational platforms that are emerging.

Enter Witlingo and others in a fast emerging Ecosystem: You build your conversation once and you deploy it on many platforms: as a skill on Alexa and as an action on Google Assistant. You use the same Witlingo API for both platforms and you go to the same portal to analyze the data that you posted to the mediating Witlingo Cloud. And even more exciting: you maintain one context for each customer and tap into that context, as the customer breezes from one device to the other, to deliver a level of intelligence so exquisite that the customer doesn’t even notice it!

I start my conversation with my Google Home upstairs (I add a stock to my Motley Fool watchlist).

Then I go downstairs to check on the chicken in the oven and resume the conversation on my Echo and I ask about my watchlist and the stock is there.

As it is reading my information, I pause it, then I go upstairs and ask it to continue, and it picks up the reading where it left off.

But wait: there’s more!

Now that I have a Google Home and an Echo in my home, I have to go to two different mobile apps to consume my the visual component of my experience? The answer is “yes” — unless, of course, your are invoking a skill or an agent that was built to interact with the Witlingo platform. In that case, all you need is one app — the Witlingo Mobile App — to consume in one place, an  HTML5 rich experiences.

In Verbal Conversations, Context is King, and human beings, trained, from the day they start to speak, in the art of multi-turn verbal exchanges that are mind boggling in their complexity at any level of scrutiny, simply cannot help themselves: they will smuggle into their exchanges with non-human voice assistants assumptions and expectations that they simply do not smuggle in when they engage with the radically artificial visual interfaces that we have become so addicted to using since the iPhone came out 10 years ago.  As a result, our expectations of these human sounding voice interfaces, no matter how much we try, will remain high. And now that the speech recognition barrier as well as the natural language barriers have been broken, we have a new frontier — and it is a formidable one — that stands between us and the satisfying experience that customers will want and won’t be able to help themselves from expecting until it is delivered: the fully natural, multi-turn, context aware, Gricean Maxims respecting conversational interface.

Ahmed Bouzid is Co-Founder and CEO of Witlingo. He is also co-author of “Don’t Make Me Tap“.



Categories: Conversational Intelligence, Intelligent Assistants

Tags: , ,