How to Rate a Voicebot: 5 Questions to Consider

By Ahmed Bouzid on January 10, 2022

Fundamental Force, 2019, by Daniel Byrd

The more experience I accumulate building voicebots, the more pronounced the fundamentals of the voicebot creation craft become in my mind. Here are five basic questions I have acquired the habit of asking myself whenever I engage in building a voicebot or in assessing a voicebot that I am using.

Why does the voicebot exist?
Or, if the voicebot is still a glint in someone’s eye, why should it be brought into existence? This is the very first question that one must ask. If a short, clear answer cannot be given, then you can unceremoniously fail the voicebot on the spot and move on to the next voicebot.

Or, if you have not built the bot yet, go back to the drawing board. Deliberately keep the question vague like that because, in those cases where a good answer cannot be given (“good” in the sense that the answer provides a rational statement about how the voicebot will help specific types of users in specific types of circumstances), the answer that will be provided will reveal something that is worth becoming aware of. As in, “Well, everyone is launching voicebots, so we needed one too” or “We want to be present in all channels … we want to provide an omnichannel experience to our customers.” Both of those are bad answers and they point to something that is broken in the organization that needs to be fixed.

Who is the primary target user and what is the primary use case?
Again, if the answer to this basic question cannot be given clearly and succinctly, an “F” grade is warranted. No one should set out building voicebots without defining precisely the primary use case for which the bot is being built (e.g., removing stains from shirts, memorizing facts, listening to upcoming event descriptions, activating a credit card). You need to be able to say: “People who are in such and such circumstances will use this voicebot to do this and that.”

Is the voicebot, hands down, the best interface for that primary user and that primary use case?
Given the primary target user and the use case, is the voicebot obviously the best way to do what the user wants to do, or is a smartphone app, a desktop app, or something else, better? My hands are dirty because I am potting a plant and I want to know what the weather is going to be like tomorrow. I am under the hood of my car taking care of a spark plug and I want to listen to some Ben Webster Jazz music. I am typing away at my laptop and I want to know when the post office will close today without pausing my typing. In all cases, a smart speaker powered voicebot is the best way to do what I want to do for those specific use case circumstances (eyes busy/hands busy). If I was not potting a plant, or was not under the hood of my car, or was not typing away, I would most probably have used a smartphone, and that is fine, since those were not the use case that the voicebot was created for.

How fast is the voicebot with its responses?
How quickly does the voicebot respond? And if it doesn’t respond quickly, does it use mitigating strategies, such as saying, “Hang on. Let me fetch the information for you” or play background noise that makes it clear that the voicebot is in the process of formulating a response?

Latency is an important dimension when assessing the quality of a voicebot. Users will not tolerate voicebots that respond slowly, or respond in a way that may be perceived slow. The key operative word here is “perceived.” The user may need to wait 5 seconds for a response, but if those 5 seconds are covered by some language or some sound as the voicebot retrieves information, the perceived latency may be much less insignificant compared to the actual latency.

Does the voicebot take care of the fundamentals?
Dr. Weiye Ma and I are coming out with a whole book in couple months titled, “The Elements of Voice First Style,” where we flesh out our recommendations for building robust, highly usable voicebots. In the meantime, here are five basic fundamentals to keep in mind when you are assessing the usability of a voicebot.

Brevity — How long are the prompts spoken by the voicebot? Voicebot prompts need to be concise, but not cryptic. Ideally, the voicebot will shorten its prompts as the user becomes familiar with the bot.
Help — How helpful is the voicebot when things don’t go right? How well does it manage no-input (the customer doesn’t say anything) and no-match (the customer says something out of scope) situations? Is the user helped recover from such situations easily? What does the voicebot do when it doesn’t understand the user? Does it provide an example of what the user can say, or does it just ask them to repeat? (“Can you say that again?”)
Cognitive Load — There are many ways that a user’s cognitive load can be stressed. Two important ones are: memory recall and choice making: (a) Is the user forced to recall a piece of information they provided earlier in the conversation? and (b) Is the user forced to listen to many choices — for instance, long menus, or several menus — during their interaction with the voicebot?
Language Coverage — How often does the voicebot say, “Hmm – I didn’t understand that,” when the user said something that was reasonable for a user to say?
Closing — When you say “stop” or “goodbye” – does the voicebot stop or does it go on talking some more, as in: “Thank you for being a loyal customer of Custardland. We really appreciate you. To find out more about us, please visit us at double you, double you, double you, dot, custard land dot com, that’s, double you…“?

Dr. Ahmed Bouzid, is CEO of Witlingo, a McLean, Virginia, based startup that builds products and solutions that enable brands to engage with their clients and prospects using voice, audio, and conversational AI. Prior to Witlingo, Dr. Bouzid was Head of Alexa’s Smart Home Product at Amazon and VP of Product and Innovation at Angel.com. Dr. Bouzid holds 12 patents in the Speech Recognition and Natural Language Processing field and was recognized as a “Speech Luminary” by Speech Technology Magazine and as one of the Top 11 Speech Technologists by Voicebot.ai. He is also an Open Voice Network Ambassador, heading their Social Audio initiative, and author at Opus Research. Some of his articles and media appearances can be found here and here. His new book, The Elements of Voice First Style, co-authored with Dr. Weiye Ma, is slated to be released by O’Reilly Media in early 2022.