Conversational Commerce in 2012: Emphasizing the “Self” in Self Service
In 2011, the idea of “self-service” is morphing from a derogatory term about automated handling of calls into an IVR or contact center and has transformed into the preferred point of arrival for users of the mobile, multimodal Internet. In 2015 we will look back to this year as one in which several emerging technologies formed the basis of products and services that define how individuals carry out everyday commerce. These are:
Accurate speech recognition combined with natural language processing: This gets to the heart of Conversational Commerce. Credit Apple’s Siri with bringing the speech enabled mobile assistant into prime time, but expect category leaders Google (just Google the word “Majel”) and Microsoft/Tellme to use their investment in speech processing technologies to leverage themselves into the mobile assistant realm. Collectively, they are making it more comfortable for people to carry out conversations with their smartphones (or tablets or TVs or cars).
Nuance will be a formidable competitor in this realm, working closely with IP and researchers from IBM. Nuance’s speech processing technology is deeply embedded in iOS-based devices (though the licensing terms and details on the integration are closely guarded). Therefore, Nuance is a direct beneficiary of Siri’s success. In the mean time, the company has effectively marketed its own platform for mobile dictation and speech input under the Dragon brand and has launched Dragon Go!, which demonstrates the value of deep integration with popular mobile destinations, including Yelp, OpenTable, Google, Bing, YouTube and a couple hundred others, based on context.
Vlingo is also formidable in this category. Aside from launching an all-out patent war in the U.S. courts, it has effectively differentiated itself as offering capabilities that neither Siri nor Nuance presently have. One of the most important is “hands-free” operation. Using the wake up words “Hey Vlingo” mobile subscribers can then enter commands and content to hear or originate text messages, conduct searches or get driving directions. These are compelling use cases and provide the mechanism for users to put their devices (running all their personal apps) under the control of their voice. Given its size, relative to the cohort of Google, Microsoft, Apple and even IBM (which is working directly with Nuance), it is unlikely that Vlingo will be acting alone. Regardless of who emerges as its benefactor or owner (device maker, mobile carrier, cloud computing provide…), Vlingo’s presence will be felt in the 2015 timeframe.
The Smartphone+Cloud paradigm: This is closely related to Apple and Siri because Siri is an app running “natively” on the iPhone 4S, but relying heavily on speech processing and computing resources in Apple’s cloud. As the retail price of smartphones continues to decline – especially with subsidies from wireless carriers – the adoption curve continues to get steeper and the population of wireless smartphone users gets more attractive. That’s why so many service providers and content providers are comfortable targeting smartphone users as a key customer base.
Common wisdom has it that, by 2015, platform fragmentation issues vis-a-vis smartphones will be largely behind us. Apple’s iOS and Google’s Android will share leadership. Android will have the edge in terms of devices in service and Apple will have the more coherent strategy for monetization of content and service delivery. They will be joined by one or more companies that, today, are considered also-rans, most likely Microsoft’s Windows Phone (with a big assist from Nokia) and perhaps RIM Blackberry. In a perfect world, the “open source” version of HP’s WebOS will become the basis for innovative application development and delivery, but that is unlikely unless there’s a cloud-based entity with its eyes on the smartphone prize.
Incidentally, Amazon.com’s acquisition of Yap shows that it has its eye on speech enabling the mobile phone (not just the smartphone) crowd. This means that Salesforce.com, which watches the operations of Amazon Web Services (AWS) quite closely will emerge as an important player in the smartphone+cloud domain by 2015.
Spoken words recognized as information assets: Once you have people comfortable talking to their smartphones, you have a rich new set of utterances to go into a corpus of data to support better understanding. In the U.S., compliance with federal laws like Sarbanes-Oxley and HIPAA requires companies to capture and store the content of phone conversations between and among employees, customers and prospects. To make the best of the situation, companies have been able to analyze, index and tag the content of these conversations to support business goals, often as part of WFO (work force optimization) programs in contact centers or to facilitate collaboration among geographically dispersed workgroups on a collaboration platform.
Customer care analytics specialists, like Nexidia, CallMiner, Verint and others have developed proprietary approaches to detect patterns, tag and analyze conversations. More recently a firm called HarQen was chartered specifically to treat spoken words as information assets. Its core product line, Symposia, captures and stores the audio from telephone calls and conference calls and allows participants or other listeners to tag or annotate conversations and share them with others. They have developed use cases for human resources to support interviews, performance reviews and the like. But the broader applications for company-wide and global deployments span a wide variety of collaboration efforts in sales, marketing, customer support or product development.
Today, speech analytics can be a complex and expensive proposition. In some cases it involves capture, transcription, tagging, analytics and reporting. In others it is pure pattern recognition, where the core technologies detect recurring utterances or find a set of predefined phrases (like detecting the hashtag “#FAIL” in a Tweet). By 2015, it will be routine to treat spoken words as just another set of unstructured data which can be put under an analytic lens in order to support specified objectives.
Advent of true “self” service: When you put these the above-mentioned technologies together, you have the foundation for smartphone-based services that are highly responsive to individual end-users. Ideally they can distinguish between background noise and spoken words, they can detect activate programs when a “wake-up word” is uttered, they can also distinguish between the voice of their owner and others and then bring pre-loaded preferences, account numbers, historical activities, loyalty programs and other personal data or PII (personally identifiable information) to bear on the task at hand.
Modern CRM and “social CRM” systems give the appearance of understanding intent, but it is largely the product of well-informed guess work, relying on data and metadata provided by customers or third-parties. By contrast, services that adhere to the “Smartphone+cloud paradigm can offer true “self-service.” For example, a smartphone app from French auto insurer Groupama (called “Groupama toujour la” or Groupama Always There) uses the iPhone’s screen as a visual display of agent queues and enables policyholders to indicate the purpose of the call and elect to stay on hold or schedule a call-back.
During the past few years, individual customers have been provided with tools to shorten the time it takes to get to a human when calling the companies with which they want to carry out business. Fonolo, Lucyphone and, more recently Hold Free are each taking different approaches to empowering phone-based customers. By 2015, we can foresee self-service more use cases and deployments that enable mobile subscribers to use their smartphones to take greater control of what personal data they would like to share, with whom they want to share it and their terms and conditions for how friends or the companies they are doing business with can make their info available to others.
Add a speech recognition and natural language understanding and you can see how an individual might say “I’m hungry” and have that two word utterance interpreted properly, and Siri-like results returned. Something like “The next available reservation at your favorite restaurant is at 6:30 PM. Should I make a reservation for you? Or would you like to invite someone else to join you?”
The technologies that are destined to survive and thrive are those that support highly personalized, conversational interactions that culminate in a transaction or other tangible result. This should be the prevailing definition of “self service.” In the near term, enterprises are spending billions of dollars on “Big Data,” business intelligence and analytics resources. Ironically, “Enterprise Mobility” is a close second based on research conducted by the likes of IBM. Our own research, to be published in January, shows that a majority of executives in large enterprises don’t have a defined strategy for managing all the data and metadata generated by mobile customers. When they do, they will also do a much better job of hearing and responding to their true wants, needs and preferences, as well as intent.
The customer care pendulum will swing away from the enterprise’s CRM system as a “customer interaction hub” to a more distributed system where individuals are at the center of their own self-service system.