Open AI’s Realtime API: A Disruptor for Customer Self-Service?

Open AI has announced the public beta launch of its Realtime API, a technology that integrates a wide range of services that were once separate: Automatic Speech Recognition (ASR), Natural Language Understanding (NLU), Natural Language Generation (NLG), and Text-to-Speech (TTS). The API is powered by the company’s GPT-4o model and provides developers with a unified solution for building conversational interfaces.  

Previously, developers had to source each of these components separately, often from different vendors, and integrate them into a cohesive system. This could be a complex process, as each component had its own strengths and weaknesses. Open AI’s Realtime API simplifies this process by providing a single API for handling speech and text-based interactions. This development has implications for a wide range of applications, including customer service chatbots, voice assistants, and other conversational systems. 

Enabling Features for Sophisticated Voice Assistants 

The Realtime API offers a range of features that make it ideal for building self-service voice applications. For example, its low-latency audio input and output capabilities enable natural-sounding conversations, allowing users to interact with voice assistants in a more intuitive and responsive way. Additionally, the API’s ability to handle interruptions automatically ensures that conversations flow smoothly, even when users interrupt or change their requests mid-stream. This is particularly useful in self-service applications, where users may need to ask follow-up questions or clarify their requests. 

The API also supports function calling, which enables voice assistants to trigger actions or pull in new context in response to user requests. This allows developers to build more comprehensive and interactive voice applications, such as self-service portals that can retrieve information from databases or trigger API calls to complete transactions. The API also supports persistent WebSocket connections for bidirectional communication between the app and the GPT-4o model, allowing for more dynamic and interactive conversations. Overall, these features make the Realtime API a powerful tool for building sophisticated self-service voice applications that can provide users with a more natural and intuitive experience. 

Separate Pricing for Text and Audio Tokens 

The Realtime API is priced based on the number of tokens used, with separate pricing for text and audio tokens. 

  • Text input tokens are priced at $5 per 1 million tokens. 
  • Text output tokens are priced at $20 per 1 million tokens. 
  • Audio input tokens are priced at $100 per 1 million tokens. 
  • Audio output tokens are priced at $200 per 1 million tokens. 

To give you a better sense of what this means, OpenAI estimates that 1 million audio tokens is equivalent to approximately 6,667 minutes of audio. This means that the cost of audio input would be approximately $0.06 per minute, and the cost of audio output would be approximately $0.24 per minute. If we assume the conversation is split evenly between input and output, we can add the costs together to get a total cost per minute of $0.15 per minute.

In a recent LinkedIn post, Peter Gostev, who writes extensively on GenAI topics, postulated that the average cost per minute for service calls handled by humans is $0.30 in the US and $0.27 in the UK, with the cost dipping below $0.10 for support in the Philippines and India.  

With those numbers in mind, $0.15 per minute for Realtime API might be a reasonable price to pay if calls are contained and customers are happy with the experience. In their announcement, Open AI commented that they intend to introduce capabilities in the future that lower these costs. 

The Future of Self-Service: Homogenization or Differentiation? 

As the Realtime API and other similar services continue to advance the state of the art in Conversational AI, it’s likely that we’ll see a significant shift in the way companies approach self-service chatbots. With the ability to access robust, real-time conversational capabilities at an affordable cost, many CCaaS vendors will be tempted to rely on these services to power their self-service offerings. And why not? It’s easier to get all that capability from one provider, rather than trying to cobble together a solution from multiple vendors. 

But what does this mean for the future of self-service chatbots? Will we soon see a world where every chatbot is running on OpenAI services? Probably not. While OpenAI is a leader in the field, there are other competitors, both closed and open source, that will continue to offer alternatives. And some vendors will undoubtedly choose to use their own fine-tuned models, tailored to their specific needs and industries. 

New Entrants: But is it Better for Customers? 

What’s more likely is that we’ll see a proliferation of new entrants into the customer self-service space, all offering voice chatbots that run on the Realtime API or similar services. Those whose offerings don’t offer robust risk mitigations will likely soon fall by the wayside. Must haves for any new entrant are support for Retrieval Augmented Generation (RAG) to avoid hallucinations, safeguards against prompt injections and other malicious inputs, and guarantees of data security and privacy. 

But the real differentiator won’t be the underlying technology – it will be how well and easily these new chatbots solutions can be integrated into existing contact center workflows. That’s the tricky part. Companies have already invested heavily in their agent and supervisor desktops, and these digital and voice chatbots will need to be hooked into the existing infrastructure so that interactions are transcribed, summarized, analyzed, and transferred to live agents seamlessly when necessary. 

The key to success, though, will be ensuring that these automated interactions improve the customer experience, rather than decreasing it. For companies, the appeal of GenAI lies in its potential to automate expensive processes and reap significant cost savings. However, customers aren’t convinced that talking to a chatbot is better for them. The pressure is on for solution providers to prove that customer experience won’t suffer as a result of increased automation. 



Categories: Conversational Intelligence, Intelligent Assistants, Articles