Home › Articles › Dear Media: Stop Treating Every AI Chatbot Like a Failure

Dear Media: Stop Treating Every AI Chatbot Like a Failure

By Amy Stapleton on July 14, 2025

Media outrage aside, California’s wildfire chatbot missteps should be a prompt for better design—not clickbait condemnation.

The Media’s Pattern: Overhype the Faults, Ignore the Wins and the Intent

When it comes to chatbots, the media rarely misses a chance to pounce. Even the smallest flaw becomes a headline, and government-backed efforts are especially vulnerable to being framed as disasters. A recent flurry of coverage surrounding California’s “Ask CAL FIRE” AI chatbot is a case in point.

“California’s AI wildfire chatbot fails basic tests,” warns one article. I also saw the chatbot come in for criticism in Perplexity’s trending news wrap-up, with the “Why it matters” section even claiming the bot’s performance “potentially endanger[s] lives” by providing inconsistent evacuation information. If you stop reading there, you’re left with the impression that California unleashed a dangerously defective tool during wildfire season.

What Actually Went Wrong?

The bot, launched in May, is designed to deliver wildfire resources in 70 languages. According to the reporting, the chatbot sometimes stumbled when asked for information about assembling an emergency kit. It apparently answered correctly when asked about an “evacuation kit,” but didn’t recognize the intent of phrases such as “go bag” and “fire ready kit.” These issues might be related to the wording of documentation accessed by the chatbot as part of its retrieval augmented generation (RAG) process. Or, if the chatbot is built using more determinisitic intent models, it could be related to missing sample utterances.

The bot apparently also gave outdated information about the containment status of a fire and inconsistent answers when asked about who issued evacuation orders. Such inaccuracies are a critical issue, but also ones that could likely be addressed with more rigorous testing and refinement. What’s missing from much of the coverage is a sense of proportion. The headlines suggest total failure; the underlying issues are real, but solvable.

Turning Failure into a Teaching Moment

Rather than framing this as yet another example of AI gone rogue, we should treat it as a learning opportunity. One clear takeaway is that there may be far more media attention on public-facing chatbot projects than we expect and much of it will focus on what goes wrong. That’s the reality of launching government technology in a high-stakes, high-scrutiny environment. We need to prepare ourselves for that attention, anticipate the “gotcha” moments, and build with transparency and resilience in mind.

But just as importantly, we should also learn that shying away from innovation isn’t the answer. Tools like this can make critical information more accessible and responsive during emergencies. The risks of early missteps are real, but so are the risks of doing nothing. Progress comes not from perfection, but from continuous improvement.

The Markup’s coverage included insights from Mila Gascó-Hernandez, research director at the Center for Technology in Government at the University at Albany. She emphasized two key lessons that any public-sector AI project should take seriously:

Lesson 1: Don’t Build Before You Plan

Don’t start development until the requirements are clearly defined. That includes understanding exactly what the chatbot should be able to do, and how success will be measured. Ideally the product team should carefully craft a set of thorough test cases at the start of the effort. GenAI has proven to be a helpful assistant in creating just such test cases. Especially for public-sector technology projects, where the scrutiny is so intense and media almost never praises results, but only mocks failures, it’s essential to resist moving forward without this foundational work.

Lesson 2: Test Thoroughly and Thoughtfully

Test rigorously before deployment. Accuracy matters, but so does consistency. The same question asked multiple ways should yield the same correct answer. Testing should also include edge cases. Adversarial inputs, like those that try to get the chatbot off topic or make harmful statements, should also be included in the testing regime. Again, GenAI can help to algorithmically generate and run test cases designed specifically for the types of questions the chatbot is built to handle.

A Smarter Way Forward

The goal of AI-powered self-service solutions should be to provide people with improved access to accurate, timely, and understandable information, especially in high-stakes situations where clarity matters most. That will require strong planning, strong testing, and yes, resilience to criticism. Because if there’s one thing we can count on, it’s that the media will always lead with the flaws whenever it comes to chatbots.

Instead of abandoning the tools when early versions falter, we should encourage better development practices. California’s wildfire chatbot isn’t the end of the story. It’s the start of the lesson.

And credit where it’s due: CAL FIRE deserves recognition for taking a bold step to make emergency information more accessible, especially in multiple languages and across digital platforms. Developing a tool like this is complex, especially under public scrutiny. While there’s clearly room for improvement, the effort to innovate in service of public safety is a step in the right direction.

‹ NiCE Interactions 2025: Agentic AI, Better Data, and a Whole Lot of Partnership

Vonage and AWS Nova Sonic: Exploring the Promise—and Limits—of Speech-to-Speech AI in Contact Centers ›

Categories: Articles