The #VoiceFirst User Interface Has a Use Case Fit Problem

We all know how crucial the Product-Market fit is to the viability, let alone the success, of a product. Build a product that is a natural fit for tween girls and market it to busy moms, and you will most probably end up with a massive flop, no matter how sweet the product or how slick and well financed the marketing push.

In fact, I believe that the number one mistake that start-ups make (and I’ve made more than my share of them) is neglecting this basic very first step: we start building something with a vague notion, if any, of who the ideal target user is, and instead delve into the fun work of ideating and building features, and lots of cool bells and whistles, and then launch to the world at large, expecting it to embrace our beautiful baby. Often, we neglect to delve into the fundamental questions of: who is the target user, what problem are we solving for them (or what value are we bringing to their life), and how are we going to monetize the value that we are delivering for them enough to survive and grow as a company.

I see a parallel mistake that is being made by builders of voice experiences: building a voice-first (and sometimes voice-only) experience without first asking the basic question: “Is the voice interface a good fit for the use case?” Instead, we delve into the hard work of designing and coding up their Alexa skill/Google action/Bixby capsule, laboring under the unspoken assumption that given any use case, if a GUI experience exists for that use case (a website, a chat bot, a mobile app), then one can build “the equivalent/parallel” voice first/only version of it.

Why Success Is Not Assured for Every #VoiceFirst Use Case

In my 2013 book, “Don’t Make Me Tap,” I have a chapter titled “The Challenge of Voice,” where I write: “A common misconception the novice VUI designer often suffers from is the belief that designing a VUI consists in taking a Graphical User Interface (GUI) and ‘simplifying it’…. After all, while only a very small minority of people can claim some talent as graphical artists, the vast majority of us can safely claim to be competent talkers – or at least competent enough to design a simple interaction between a human being and a dumb computer.” Then I go on to declare (for dramatic effect mainly, since we are comparing apples and oranges”) that “VUI design is a lot harder than GUI design.”

“VUI is harder than GUI” is probably fair to state if one is referring to someone who is a visual designer and is not aware of “the baggage” they are carrying as an experienced creator of visual/touch/click experiences and therefore not aware of three fundamental characteristics of VUIs:

  1. Time linearity: Unlike graphical interfaces, voice interfaces are linearly coupled with time;
  2. Unidirectionality: just as time is a one way street, speech is a one-way medium; and
  3. Invisibility: unlike in a well designed website or mobile app, where the user is easily be able to tell where they are in the interface’s ecosystem (highlighted menu item, the url, etc.)

With voice, the user often can’t tell “where” they are in the conversation, and if the designer does not take this invisibility into consideration, the user will quickly “feel lost” a few interactions into the flow (hence the crucial importance of error strategies to recover from inevitable slippages).

But just as in life, depending on the circumstances, every “weakness” or “shortcoming” can turn out to be a strength and every “strength” can be a liability, in the world of voice first/only, the above three characteristics, often deemed “challenges,” can in fact be the pillars upon which a voice first/only experience can be delivered that is superior to or un-paralleled by an “equivalent” visual/touch/click experience.

An Illustrative Use Case

Here’s a use case to illustrate what I mean. When we are trying to memorize something (say a list of facts, or a poem, some text), we usually do it in a repetitive back and forth, where we speak or mouth what we are trying to memorize and get feedback as to whether we are correct or not and hope that, in the next try, we will get it right. Ideally, we are working with someone else who is asking us questions while we pace back and forth, giving answers and receiving feedback, and then moving on to the next one. Ideally, the back and forth is time constrained (answer me quickly), we are moving on to the next question linearly, and our eyes are closed (which we naturally often do when we are memorizing). In other words, the “interface” is time linear, unidirectional, and invisible.

Here’s an example of an Alexa skill that leverages the fundamentals of the voice first/only interface: learning facts and being quizzed about them conversationally.

So, next time you decide to build a voice first/only experience, make sure you first pin down the Product-Market fit, and then do your homework to ensure the UI-Use Case fit. In other words, ensure that the “liabilities” of voice first/only are the strengths that make the voice interface you are building not a poor version of a “superior” visual/touch/click interface. Instead, be one that is not only superior for the use case, but one that cannot be delivered as well, if at all, in any other way than via the voice first/only conversational UI.


Ahmed Bouzid, previously Head of Product at Amazon Alexa, is Founder and CEO of Witlingo, Inc., a McLean, VA-based B2B Saas company that helps brands launch conversational voice experiences on platforms such as Amazon Alexa and Google Assistant. 



Categories: Conversational Intelligence, Intelligent Assistants, Articles

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.