“Audio Ergo Sum!” (Part 1)

Almost every Marketing executive that I have spoken with lately (and I have spoken with a long list of them) – including those who have lived long enough to remember a world without the Web, or cell phones, social media, mobile apps, or smart watches – has expressed puzzlement over the rise of audio. Social: they get it. Mobile: that, too, they get. And they understand why videos are so effective. But audio? “Isn’t that, like, a step backwards?”

First, let me observe, because I was there, an active, sentient young adult, that it took a long time for Marketers (and basically everyone else) to understand the Web, let alone take it seriously. Inconceivable as it may seem to us right now, Marketers — and almost everyone else — did for a few years truly see The Web as a passing fad. Sure, Mosaic (the first usable web browser) came to life in 1993, Amazon in 1995, WIFI in 1997, and Google in 1998. And sure there were extreme flights in fantasy between 1995 and 1999 where, if your idea smacked of being reasonable or was not radical enough, an investor was not interested.  But for almost a whole decade, Marketers cautiously stayed away from the Web. Driven by concrete, measurable metrics and no-nonsense bottom lines, Marketers decided that there were just too many dots flying around, too many ideas, so that the prudent thing to do was to focus on what they knew worked and to let the chips fall where they may when the gravity of real life imposed itself.

And they were in a way right. It wasn’t until around the year 2000, when the bubble burst that the dust started settling for real. (I remember distinctly that it popped in late March and early April 2000 and that the popping coincided with, though some felt that it was caused by, a federal court ruling that Microsoft had committed monopolization.) This was when, finally, viable, down to earth, and most crucially, rational and coherent, business models began to emerge. (Google started their ads in 2000, though Amazon was bleeding cash and would continue to bleed it for years to come.) Combine that with a critical mass of households owning a PC (51%) by mid 2000, and clarity suddenly emerged from the ashes so that things suddenly became just too obvious for Marketers (and everyone else) to ignore.

The exact same thing happened — though with far less hysterical exuberance — with cell phones, social media, smartphones and mobile apps.  Sure, the number of “deer-caught-in-the-headlights” years was certainly smaller, but the pattern was the same: (1) Obliviousness: too busy executing to take note of ‘the noise,’ followed by (2) Timid curiosity, followed by (3) A bit of questioning: ‘Yes, sure, this thing called Facebook is taking off, but is it really something that we need to pay attention to?’ followed by (4) Puzzlement: ‘Ok. We definitely need to factor this thing into our mix, but how?’ which is then quickly followed by (5) Realization and adoption and, ‘Of course! This is where things are going and we need to go full force!’

So, really nothing new with voice and audio. Very soon, the obvious will become obvious and, as things stand, I would say that Marketers are probably gingerly traveling somewhere between Questioning and Puzzlement, with Realization and Mainstream Adoption very much in the near horizon, if not already in motion.

In the meantime, as they prepare themselves for action and execution, Marketers need to understand that audio is a radically different medium from what they have been dealing with so far. Audio is not text. Audio is not images. Audio is not video. The old ways of thinking about how to engage your audience will simply not work with audio.

To understand the power of audio, let’s start with the basic, Cartesian fundamentals and build things up from there.

Let’s start by taking a step back and noting just how pervasive audio is in our daily, mundane lives.

Let’s start by asking: where is audio (voice and sound) used in real life today and in such a way that no one can argue that it is not providing us with non-fungible value?

And where it does, let us ask: in what way is it providing us with such value so that we can abstract from those observations and begin constructing a way to methodically deliver audio-first experiences that no one can declare a “nice to have” or simply ignore?

Let’s Start with Some Use Cases

Here is a list of 20 reality-proved touchpoints where audio (voice and sound) are delivering clear value:

  1. Self check-out at the supermarket, where the robot tells you the name of the item you just scanned as well as its price.
  2. The beeping at the ATM machine alerting you to take your money or to take your card back.
  3. The thudding sound that the gas pump makes when it’s done pumping.
  4. The chime sound that an elevator makes when it arrives, alerting those waiting for it to get ready to board it.
  5. The elevator announcing to those riding it, the floor number when it has arrived to it: “15th Floor.”
  6. The beeps from the toaster when it’s done.
  7. The beeps from the microwave oven when it’s done.
  8. The beeps from the microwave oven when it wants to alert you that you have yet to take your stuff out.
  9. The sound the fridge makes when you leave the door open.
  10. The sound the oven makes when it has reached the temperature you set it to.
  11. The beeping sound of a car when you leave the keys in it or the lights on.
  12. The sound of a coffee maker when it’s done percolating.
  13. The loud, blaring sound the washing machine makes when it’s done.
  14. The softer sound the dishwashing machine makes when it’s done.
  15. The sound of your wake up alarm in the morning.
  16. The blaring sound of a building-wide fire alarm.
  17. The piercing sound of a house-wide smoke alarm.
  18. The pompous, self-important sound made when the laptop or the smartphone restart.
  19. The warning sound given when you are about to reach the end of an airport moving walkway.
  20. The (weird) instructions sounded at a light crossing: “Wait! Wait” then “Walk! Walk!”

Having listed the 20 items (and I’m sure I could probably come up with 20 more if I wanted), it is clear that audio (voice and sound) is a ubiquitous presence in our daily life. These include non-trivial, important ways that range from ensuring that we make the most of our quality of life (a hot, crisp toast is more enjoyable than a stale, cold one); to avoiding irritating mistakes (absentmindedly buying something that is far more expensive than we thought it was); to helping us avoid minor disasters (leaving the cash or the card at the ATM machine); to saving our lives (smoke and fire alarms).

Next: What can we abstract from these points of value so that we can repurpose the lessons learned when we are thinking about using audio — voice and sound? This I will tackle in Part Two of this essay.


Dr. Ahmed Bouzid, is CEO of Witlingo, a McLean, Virginia, based startup that builds products and solutions that enable brands to engage with their clients and prospects using voice, audio, and conversational AI. Prior to Witlingo, Dr. Bouzid was Head of Alexa’s Smart Home Product at Amazon and VP of Product and Innovation at Angel.com. Dr. Bouzid holds 12 patents in the Speech Recognition and Natural Language Processing field and was recognized as a “Speech Luminary” by Speech Technology Magazine and as one of the Top 11 Speech Technologists by Voicebot.ai. He is also an Open Voice Network Ambassador, heading their Social Audio initiative, and author at Opus Research. Some of his articles and media appearances can be found here and here.



Categories: Conversational Intelligence, Intelligent Assistants, Intelligent Authentication, Articles

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.