Why Voice Platforms Need to Make Foundational Changes to Allow Multi-User Skills

By Chris Butler on October 14, 2021 • ( 0 )

Credit: Photo by Brett Jordan on Unsplash

If you own a smart speaker and live with someone else like a partner, a child or housemates, you have probably experienced weirdness in how it operates. It might be calling you the wrong name, giving you the status of someone else’s package, or simply saying it doesn’t recognize you.

I’ve noticed these issues with my Spotify recommendations being skewed by my children’s use of our Google Home Hub in the kitchen. There are “made for you” playlists including music that I don’t really like but I know my children do.

The reason that these devices have these problems is because they are not built with the full communal context in mind. Without considering everyone that does have access to the device it is like only allowing one person to get items out of a refrigerator for a household.

This isn’t the skill developers fault. It is because the major platforms by Amazon, Google, and others are not providing the right identity, privacy, security, experience, and ownership models to fit with the home (or office).

What is a skill developer building on top of these platforms to do?

Communal Computing
Connected communal devices have been around for a very long time. Ever since the landline phone in the kitchen we have dealt with the impact of bringing a network into our homes. At that time the phone number was associated with a house rather than a person. It was expected that anyone in that household could pick up the other end.

However, it wasn’t until computing made it to the home for personal use did we start to see the problems. The days of mainframes and client/server architectures dictated a need to time share between people. They invented the user model to track who was using the computing resources. This made it into personal computers when they were bought for home use. Eventually the Amazon Echo did this by tying usage to the buyer’s Amazon account.

As an industry, these problems have started to become more and more of an issue. I break them down into five key problem areas:

Identity: Do we know all of the people who are using the device?
Privacy: Are we exposing (or hiding) the right content for all of the people with access?
Security: Are we allowing all of the people using the device to do or see what they should and are we protecting the content from people that shouldn’t?
Experience: What is the contextually appropriate display or next action?
Ownership: Who owns all of the data and services attached to the device that multiple people are using?

Without addressing the communal nature of these devices, people will buy them and set them up but eventually grow tired of the weirdness. Eventually they will replace them when the expectations aren’t continuously met in these areas.

Building Skills for Communal Use
Amazon Alexa’s skill library is full of useful capabilities for Alexa to help people in their homes and offices. Most people will use their Echo to play music or set timers. At first these capabilities seem devoid of context. You will ask Alexa to play a particular song through Spotify. Then you will realize that everyone does this and it starts to impact use on other devices. Going further there are certain skills like shopping lists or delivery notifications that are specifically for the Amazon account owner rather than everyone that is in the house.

The reality is that these devices are deployed in what looks more like an ecosystem of people and devices. To understand that context is to be able to provide the right services for that environment. I’ve written previously about the dos and don’ts of communal computing but what should a skill developer do until the platforms make foundational changes to their models?

Use Voice Profiles, If Available

Alexa can be taught your voice to recognize it.

Amazon has started to consider the various people using a device with voice profiles. This requires everyone to feel comfortable with providing their voice samples to Amazon. There is a real concern about privacy of people’s voice data as Dan Miller has written about previously.

If they are available they can provide a quick way to identify whether a person with a profile is making the request. The problem will be whether everyone in the house can and will add a voice profile. In the case of Amazon, it requires the owner of the device to log into their Alexa app (or provide their credentials to someone else to log in) and enter the voice profile.

As of writing this, I don’t think we can depend on these voice profiles being present or properly set up. The use of biometric data is fraught with issues but may provide some relief to developers if they are adopted more widely.

Don’t Trust Behavioral Data from Communal Devices
Most popular skills are extensions of a larger service that is available on personal computers and phones. The example I like to reference is Spotify. I use it while I’m walking around, in my car, and on the home assistant in my kitchen. These are three different modes on a spectrum of “communal-ness.”

Behavioral data is the cornerstone of how Spotify and other media consumption services make their recommendations. The data collected is both implicit (songs played) and explicit (adding to a playlist or liking). These signals are collected from my smartphone and home assistant but the data isn’t of the same context. It is highly likely when I’m listening to Spotify in my earbuds that it is just for me. However, when it is on a home assistant or smart speaker there is really no way today to tell who else is listening.

When the listening behavior is significantly different on a home assistant than on my personal phone it is less likely that I’ve suddenly changed my preferences and that someone else is listening. In these cases we should consider whether collecting that data is worthwhile, and if we do, whether we should segment it from the personal listening data altogether.

Build Pseudo-Identities of the Location, Rather than a Person
While voice platforms rarely support a profile for the entire house, Spotify has started to consider couples and families listening together through their Duo and Family Mixes respectively. This is because they know there is some overlap between people in their listening habits.

I’d go a step further and assume that the communal device is usually placed in a part of the home that certain people will regularly access. My kitchen could be me, my partner, and my children in combination at any time. This means we should consider the identity of that location rather than the person that is logged into the device or the service.

The behavior captured in the kitchen is its own profile. It will include all of the different people that regularly listen there and not mingle data that is more personal (maybe that Industrial music I listen to on my own isn’t appropriate for the whole family).

Ask When in Doubt
If you are about to do something that could impact a person’s privacy or take an action that would cause annoyance (or harm) you should ask them if they mean to do what they are doing. When someone asks to add an item to a shopping list, it could be as simple as asking “do you want to add this to Chris’ shopping list and notify him?”

Or when someone asks for a bank account balance, you should confirm who you are talking to: “this will announce the bank account balances for Chris Butler, are you sure you want to?”

This won’t help against people that are purposely trying to subvert a system but most people are not like this. We live in our homes based on norms of what we find acceptable. I know my partner’s smartphone code is but it would be a huge violation of trust if I ever unlocked it without them telling me to (and vice versa).

These norms allow the humans to do the job of assessing who is in the room and if the situation is appropriate before the privacy is violated.

Be Less Sure!
The summary of these recommendations is that when we start deploying things to the home we can’t be sure it is a single person using it. To build better devices we need to start considering the context the devices are deployed in with all of the different people rather than assuming it is a single person.

This is great advice for platform builders like Amazon and Google but skill developers are left in a tough spot. To help avoid these problems with communal devices:

Use voice profiles, if available
Don’t trust behavioral data from communal devices
Build pseudo-identities of the location, rather than a person
Ask when in doubt

If you take these steps you build skills that will match the context of the place these devices are deployed and build more trust with your user base. This is necessary until the platforms build the capabilities that assume this by default.

Chris Butler is a chaotic good product manager, writer, and speaker. He has been a product leader at Microsoft, Facebook Reality Labs, KAYAK, and Waze. Chris is currently PM’ing the PM experience at Cognizant as the Assistant Vice President, Global Head of Product Operations. He has been focused on the impact of communal computing in the home and office as our world gets more connected.

‹ NICE Defines Foundation for Handling “Massively Asynchronous” Experiences

Vonage “Gets” Conversational Commerce; Acquires Jumper.ai ›

Categories: Conversational Intelligence, Intelligent Assistants, Intelligent Authentication, Articles