Google’s Duplex Paradox

“Have your bot call my bot.”

That used to sound like the punchline of a bad joke about IA’s dystopic future. But it is a prospect that is no longer funny or farfetched, as was made amply clear when Google’s CEO Sundar Pichai showed off the latest conversational capabilities associated with Google Assistant. Called Duplex, it integrates the latest advancements in Natural Language Processing (NLP), most strikingly the ability for an automated virtual agent to talk to humans, or other automated systems for that mater.

I regret that the announcement was made during Google I/O, an event that was held at the same time as our highly-successful Conversational Commerce Conference-London, last week. I also had to do a double take when noting that the Principal Engineer who authored a blog post that clarifies the design principles and objectives of Duplex carries the surname “Leviathan”.

NLP Has Become the Center of Gravity for IA

We have observed for some time that Google is successfully conditioning us to use our own words to conduct voice search and take command of our browsers, entertainment centers and smart appliances. As Messrs Leviathan and Matias note in the blog post, “we have witnessed a revolution in the ability of computers to understand and to generate natural speech, especially with the application of deep neural networks.” People no longer blurt out search terms to Google Assistant or bark out their destinations or desired arrival times to Google Maps.

We speak in full sentences, often adding opinions, local color and emotion. That said, it is important to note that Duplex must be extensively trained on specific use cases. In this case it is all about making appointments or reservations at a restaurant, hairdresser or other service provider. This is definitely not about moving closer to the dreaded “Artificial General Intelligence” (AGI) whose arrival is expected to herald the end of life as we know it.

Instead, Duplex is all about a computer-generated voice sounding totally natural as it carries out specific tasks in what the developers call “closed domains.” Yet even with those warnings, it has to spark the creative juices of designers and service developers among brands around the world.

Under the hood (meaning in the cloud), the new service leans heavily on something the developers call “recurrent neural networking” (RNN). Apparently deep neural networking (DNN) was not up to the task of keeping up with advancements in “understanding, interacting, timing and speaking” all at the same time. The foundational technology is TensorFlow Extended (which they call TFX)… the worthy rival to IBM’s Watson and Microsoft’s LUIS and the rest of the array of Cognitive Resources.

Regardless of the platform, Google benefited from having hours and hours of training data from voice search, originating with the original GOOG411 about which we noted, when it was shuttered in 2010, had been an invaluable source of spoken queries used for training purposes.

Talking like a Human Gets Complicated

Google went the extra mile and then some in order to have Duplex sound so life-like. It even made sure that the virtual agent interrupted its utterances with “speech disfluencies” – y’know those annoying “umms” and “urrs” public speakers have been trained to eschew. In user studies, Google found that including these interruptions and extra syllables made conversations feel more natural and intimate.

There lies the rub. Much of the post I/O commentaries have homed in on ethical issues surrounding a presumed requirement Duplex, in the course of a conversation, must make it clear that it is a bot. This is a topic that will arise more frequently as autonomous “voice-first” assistants, and text-based chatbots become indistinguishable from their human counterparts.

As noted in this Wired article, well-respected ethicists feel strongly that a bot must self-identify in the course of a conversation. Yet, in the not very long-run we must ask ourselves whether this is necessary, or even possible. It is standard operating procedure for Intelligent Assistance platforms to be trained by subject matter experts and top employees (like customer care agents) in business organizations. Interactions, for one, has a patent on the use of live agents “in the background” to disambiguate spoken input and refine understanding in the course of a single conversation.

The opposite is also true. Many live customer service reps rely on virtual agents in “whisper mode” to prompt them through complex discussions. They serve the same function as a “screen pop” on an agent screen used to prescribe specific wording or suggest the “next best action” for an agent to take.

Blurred Lines

By unveiling Duplex through use case designed to blur the distinction between a live person and a virtual agent, Google has given us all food for thought. Certainly there will be use cases where it is important for an automated agent to identify itself as such. It empowers the endusers by enabling them to choose their preferred conversational partner. Long time voice or conversational user interface designers well-know that there are contexts in which people prefer to interact with an automated system, making the promise to pay a late bill or, rapidly booking seats on a train come to mind.

In the coming years we will observe an increasing number of use cases where the liveness of an assistant or agent is irrelevant. People want results. Interactions are getting complex. Virtual agents are good a making complex things simple, often at large scale. That they can now do so using natural language both for input and output is a very positive step.

When thinking ethically, it is also important to think practically. As bots become more human-like, you can expect them to be held to the same standards as humans. Popularity, authenticity and trust are major determinants of loyalty and repeated use whether you breathe air or merely ingest and analyze massive amounts of data.



Categories: Intelligent Assistants, Articles