The Five Scandals of Amazon Alexa and Google Assistant

By Ahmed Bouzid on March 9, 2021 • ( 3 )

If Amazon Alexa or Google Assistant were a person, what kind of person would they be, given their conversational behavior?

In my opinion, this is the kind of person they would be: they would be someone who is impatient and eager to take the conversational turn when you are talking to them, who won’t stop talking unless you interrupted them by saying their name, who is so anxious to be spoken to that they start talking to you at the mention of anything that remotely sounds like their name, who suffers from severe short term memory problems, and who egregiously mispronounces some basic words really badly and, worse, won’t take corrections and won’t learn how to pronounce them properly. Take a second and visualize such a person. I think you would agree with me that, unless they were a loved one, you would not want such a person in your life. Such a person would clearly be dealing with some non-trivial emotional and cognitive issues and should be under some therapy program.

We have now entered the seventh year in the life of Amazon Alexa (and the sixth in that of Google Assistant’s). Seven years is an eternity in Information Technology. In November 2014, when the Echo was launched (albeit by invitation only), Amazon introduced what I believe was and remains a monumental achievement in the annals of technological progress. Far-Field speech that is accurate enough and fast enough to be accepted by the masses is a staggering achievement. It is an achievement for the ages, and those behind it can rest assured that in their deathbed, they won’t feel that they hadn’t done something important with their life.

But it is precisely because such amazing people were able to pull off such a marvel of a feat that I find the current state of Alexa and Assistant nothing short of scandalous. If the teams behind them were able to pull off the miracle of far-field speech — miracles of both imagination and engineering — how on earth can one explain that seven years on, those teams have not solved what I believe are the key barriers to taking conversational voice to the next level?

Let me explain.

1. Hurry up and shut up!

Unless you know exactly what you want to say, as in, “Alexa, what’s the weather?” or “Hey Google, what time is it?” talking to Alexa or Assistant stinks. Say I want to listen to the song “Heart of Glass by Blondie.” I launch Google and say, “Hey Google, play, ummmm” — I blank for a couple of seconds on the name. Does Assistant give me some slack? Nope. In the middle of my “ummm” Assistant comes back impatiently with, “Check out this Miles Davis station on Youtube…” Ugh. I stop it. I was listening to Miles Davis last night and now it thinks that I want to listen to Miles Davis some more. As usual, too clever by half. I try again, having rehearsed the phrase a couple times: “Hey Google, play ‘Heart of Glass’ by Blondie on… ummmmm'”. It interrupts again and says, “Choosing songs is only available for youtube music premium members, but you might like this station,” and then it starts to play “Rhiannon” by Fleetwood Mac (a song that I do love, but that’s not what I wanted).

Ok, I take another deep breath, and I say, “Hey Google, play ‘Heart of Glass’ by Blondie on bedroom TV.” To which it responds with, “Got it. Heart of Glass, video version from Youtube music, playing on Bedroom TV.” Very sweet! Blondie is playing on my TV! I am satisfied, but I am also shaking my head: all the brains, all the thinking, all the hard work, all the little miracles along the way, and they still rely on silence to determine whether I stopped talking or not. The incompleteness of my request, or the fact that I’m saying “ummmm,” is just not taken into consideration. The thing wants me to hurry up and finish talking so that it can go and take care of it. How can people who solved far more complex problems come this close and not go all the way and solve what I believe is the biggest problem with the conversational voice first interface today: the way it puts you on the spot, hurries you, gives you anxiety, won’t let you hesitate? Who wants to engage an interface that makes them feel anxious, slow, dumb? Life is stressful enough as it is?

2. Say my name — Say. My. Name!

Here’s what, as far as I’m concerned, is the second scandal: the thing won’t stop talking unless I say its name. Alexa won’t be interrupted unless I say, “Alexa.” When Alexa and Assistant were both launched six or seven years ago, that was just fine. I remember thinking, “Well, this is annoying — but, this is viable. I can live with this until they fix it, probably by next year.” Nope — I still have to say their name when they’re talking, otherwise, I can scream at the top of my lungs for them to stop and they will just ignore me. And who wants to talk to someone who suffers from such an affliction?

3. Are you talking to me?

Here’s another one: I am talking with my wife and she is telling me that it’s not healthy to eat charred meat. So I tell her that I will just scrape off the “black stuff.” At which point suddenly Alexa pipes up: “Ok. For meat, I recommend meatloaf from the Food Network. One hour twenty five minutes to make, serves six…” So I grit my teeth and I say in exasperation, “Alexa. Stop.” It doesn’t. I say again, almost shouting, “Alexa, stop” and finally it does. “What was that we were talking about again, honey?” What had happened there was that Alexa was listening so eagerly for its name, like an unstable person starved for attention, and at the mention of “black stuff” (I figured it out) it thought I had said, “Alexa,” and responded as if I had asked, “Alexa, the meat.”

4. Frankly my dear…. I don’t… remember

More. I ask Alexa to set up a reminder for 7:25 AM so that I don’t miss my 7:30 AM call. So I say, “Alexa, set up a reminder for 7:25 AM.” It tells me that it did so, but then it adds, “By the way, did you know that you can set up recurring reminders, such as “Alexa, set a recurring alarm for 7:00 AM”? Yes, I do know that, because you have been giving me this tip now almost (but not always) every time I ask you to set up an alarm. Why can’t you remember that you had given it to me many times already, you dumb, glorified alarm clock? Is it because I have not used recurring alarms yet and you won’t stop sharing that tip until I do? How annoying!

5. You know, I’m just too busy for prog-ress (not pro-gress)

The last one — for now. This one is less of a scandal and more of an “Ugh!” As in, “Ugh — really, come on. Someone over there should care more than this. Truly.” They should care that the word “live” is pronounced not to mean the opposite of “die” but the opposite of “recorded” or “offline,” when that’s what is meant to be said. That “contest” is pronounced to mean “oppose” or “challenge” and not “tournament” or “game.” That “present” is pronounced to mean “gift” and not “show” or “offer.” That “desert” is meant to refer to something like the Sahara and not what a soldier does when they go AWOL. You catch my drift. And you may think that this is not a big deal, and that’s probably fair enough. But, you know, when it’s a sunny Saturday morning and you wake up to the gentle sounds of Google assistant playing soothing positive Jazz morning music, with a summary of the news and you feel like you are in a sci-fi move about the year 2021 — and then, screech — you hear, “Liverpool scored an upset victory against Manchester United yesterday,” you can’t help but go, “Ugh — if I knew you were that smart, I would have thought that you were being playfully clever by half, pronouncing “upset” to mean, “spoiling the mood of Man UTD fans with the unexpected victory.” But no, I know that you are not that subtle, and are just being sloppy with the way you speak words. And that’s just not necessary. Not after all the magic that you have created.

I know, I know, all of these are first-world, 9.9% problems, and all that. But you know that feeling we have when we see something that is so nearly perfect but has a couple of blemishes here and there and we can’t help but be annoyed and so speak up urgently about the blemishes? Well, that’s how I feel about this new, marvelous, life changing technology. So, let’s fix those not-so-small, but fixable, blemishes and let’s get this interface to the next level of amazing.

Dr. Ahmed Bouzid, is CEO of Witlingo, a McLean, Virginia, based company that builds tools for publishing Voice First experiences, such as Alexa skills, Google actions, Bixby Capsules, Microcasts, and crowd-sourced audio streams. Prior to Witlingo, Dr. Bouzid was Head of Product at Amazon Alexa and VP of Product at Genesys. Dr. Bouzid holds 12 patents in Human Language Technology and was recognized as a “Speech Luminary” by Speech Technology Magazine and among the Top 11 Speech Technologists by Voicebot.ai.

‹ Webinar: Action Plan for ASR – Applications for Conversational Intelligence

Four Reasons Why This is The Year of Voice ›

Categories: Conversational Intelligence, Intelligent Assistants, Articles, Mobile + Location

Sam Nanj

March 10, 2021 • 7:01 am

You’ve expressed in words what I find exasperating about Alexa (and Google Assistant). If they could simply collect the number of times that people say “Alexa Stop!” and the volume at which it’s said then I think they would have a decent idea of how infuriating we all find this kind of stuff. After all, Alexa knows when I am whispering so surely she should know when I shout at the top of my lungs! To shut the f**k up. And again you’re right on the money – in that you have to formulate in your mind exactly what you are going to say and how to say it quickly if you are to have any chance of a decent response. Sadly, the smart bods and Google and Amazon are keener to add new functionality than to fix these annoyances. But be warned if Amazon and Google don’t fix this soon than someone is going to make a killer assistant where these things have already been fixed. I wonder who that might be? Hmmm

Frest

March 15, 2021 • 2:15 am

Take a second and visualize such a person. I think you would agree with me that, unless they were a loved one, you would not want such a person in your life. Such a person would clearly be dealing with some non-trivial emotional and cognitive issues and should be under some therapy program.

Reminded me of a paper that was written regarding the psychological nature of certain characters in Star Wars:

– Psychopathology in a Galaxy Far, Far Away: the Use of Star Wars’ Dark Side in Teaching

Star Wars is well known, timeless, universal, and incorporated into shared culture. Trainees have grown up with the movies, and based on their enduring popularity, attending psychiatrists are likely to have seen them too. This article highlights psychopathology from the Dark Side of Star Wars films which can be used in teaching. These include as follows: borderline and narcissistic personality traits, psychopathy, PTSD, partner violence risk, developmental stages, and of course Oedipal conflicts.

Julian Harris

August 13, 2021 • 9:59 am

Have you much experience with the Chinese counterparts? Microsoft shared progress of the Xiaoice suite of solutions back a couple of years and they were years and years ahead of Western counterparts (Xiaoice Gen 6 2018 was still ahead of what the West can do https://www.youtube.com/watch?v=hhcRPnXPu4Q)

The Five Scandals of Amazon Alexa and Google Assistant

1. Hurry up and shut up!

2. Say my name — Say. My. Name!

3. Are you talking to me?

4. Frankly my dear…. I don’t… remember

5. You know, I’m just too busy for prog-ress (not pro-gress)

Related

3 replies

The Five Scandals of Amazon Alexa and Google Assistant

1. Hurry up and shut up!

2. Say my name — Say. My. Name!

3. Are you talking to me?

4. Frankly my dear…. I don’t… remember

5. You know, I’m just too busy for prog-ress (not pro-gress)

Related

Related Articles

3 replies