When can AI agents meet customer expectations? What are the right approaches, today and tomorrow?

We observe that Customer Services (insourced and outsourced – BPO) often display studies showing that “customers prefer to talk to humans rather than interact with AI robots”, but these Customer Services are nevertheless offering more AI services.

Customers do prefer to interact with humans, yet they buy products and services from sellers who strive to reduce the human cost by offering AI support solutions.


But is it economical to entrust all customer interactions to AI agents? Does the customer really win, does he really have a choice?

It is easy to understand that it is difficult to convince a customer to pay a premium at the time of purchase to be able to interact with a human in case of possible support needs,

To stay competitive, you need to find ways to make customer service operations less expensive without degrading quality by using AI-powered tools to boost the productivity of human agents.

What are the limits to consider?


In the case of basic, simple, repetitive interactions that do not involve major risks such as:

  • Financial risks
  • Reputational degradation
  • Risk of non-compliance
  • Legal risks

These interactions can be handled by AI agents.


On the other hand, during interactions that require:

  • Empathy to reduce customer stress.
  • Clarify the customer’s description of the problem.
  • Solve complex multi-source problems.
  • Consider various combinations of solutions with choices to be made
  • An understanding of the compliance issues at stake (financial, health-related, etc.) with a critical impact for the customer and the company, in terms of safety, security, etc.

Then humans can interact in a much cheaper and more efficient way.


What are the constraints to make humans efficient?

However, we must be aware that this efficiency at a controlled cost requires that effective tools be provided to agents:

  • Access to up-to-date processes.
  • Access to technical references.
  • Access to complex diagnostic methods (decision trees, etc.).
  • Climbing protocols.
  • Clear and smooth clearing protocols.
  • Contributing to the daily improvement of these tools by providing meaningful feedback.

When AI works… And when it doesn’t

Let’s be fair: there are cases where AI brings real value to customer services.

  • Instant answers to frequently asked simple and non-English questions,
  • 24/7 availability.
  • Ability to handle spikes in demand.
  • applications in several languages.

For well-defined routine tasks, automation can be really helpful.

The problem arises when you try to apply the same logic to the entire spectrum of customer service. Exceptional situations, complex problems, frustrated customers, or sensitive complaints require empathy, flexibility, and human judgment – precisely what AI can’t always deliver today.

A March 2025 report by McKinsey & Co. showed that 71% of companies are now using generative AI in at least one business function, but adoption is significantly lower in highly regulated industries: 63% in healthcare and 65% in financial services, precisely where errors have the most impact.


A proposal for balance

Maybe the question shouldn’t be “AI yes or no?”, but “how much AI and where?”. A smart hybrid approach would be:

  • Use AI to filter and categorize initial queries, recognizing their limitations
  • Truly automate simple, repetitive tasks where errors have little impact.
  • Implement human verification systems for critical responses (keeping humans in the loop, as recommended by Red Hat)
  • Facilitate quick (instantaneous?) access to a human when the situation requires it or when the AI expresses uncertainty (knowing that one of the problems of AI agents is to admit that they are not able to answer correctly… see our previous publications)
  • Include clear warnings about when information is AI-generated
  • Train human agents to work with AI tools that empower them, not replace them
  • Measuring success from the customer’s perspective in the foreground, not the company’s, success for the customer inevitably leads to the company’s success. The opposite is rarely true…
  • Be transparent about when customers interact with AI and when with humans.
  • Trust the critical thinking of your teams by involving them directly in the improvement of solutions provided to customers.

We can summarize this long list by recommending that your teams be provided with tools that use AI and manage your company’s knowledge while improving it with a Knowledge Management System (KMS) platform


The inconvenient question

Ultimately, a tricky thought remains: are companies adopting AI in customer service because it’s the best solution for their users, or because it seems like the solution to protect profitability? Is it a real innovation or an optimization disguised as modernity?

And do they do so with full knowledge of the real technical limitations of these systems? The data suggests not. When GPT-4 and GPT-4 Turbo, the most accurate models available, hallucinate 3% of the time; when advanced reasoning models such as O3 and O4-mini hallucinate 33% and 48% of the time, respectively; when OpenAI’s largest and most expensive model needs to be retired after just 4 months; When courts start holding companies accountable for the false information provided by their chatbots, it all suggests that the industry is trying to run before it learns to walk.

The answer likely varies from company to company, but the deafening silence on customer satisfaction studies, abandonment rates in automated systems, the number of users desperately seeking the “talk to a human” option, and the real, documented technical failures of LLMs, suggests that we may not be asking the right questions.

Technology is a tool, not a goal. And a tool is only useful if it solves the real problems of the real people who use it, and if it works reliably in the real world, not just in lab benchmarks.

As long as decisions about AI in customer service are made in the boardroom by looking at spreadsheets and business presentations from technology vendors, rather than talking to real customers and with an honest understanding of documented technical limitations, we will continue to see implementations that prioritize business efficiency over the human experience.

And perhaps most worryingly, we will continue to see companies surprised when their AI systems fail, when their customers become frustrated, and when they discover that short-term savings can be very costly when they measure up to the loss of customer reputation, trust, and loyalty.

The technical reality: LLMs aren’t as brilliant as they seem !

Beyond the marketing hype about the spectacular advances of AI, the technical reality of large language models (LLMs) has significant limitations that are rarely mentioned in corporate presentations.

Hallucinations: the persistent Achilles’ heel

One of the most serious problems is that LLMs can confidently “hallucinate” information. These systems invent data, quotes, or facts that sound perfectly credible but are completely false. A Vectara study found that the most accurate models, GPT-4 and GPT-4 Turbo, hallucinate about 3% of the time when summarizing texts, while other models achieve error rates as high as 27%.

In customer service, this has real and costly consequences. In February 2024, Air Canada was ordered by a Canadian court to pay compensation to a customer after its chatbot fabricated a bereavement fee policy that didn’t exist. The bot confidently claimed that customers could request retroactive discounts up to 90 days after ticket issuance, which is completely false according to the company’s actual policy. Other notable cases include DPD, a European logistics company, which had to disable part of its chatbot after it started insulting customers and describing the company as “the worst delivery service in the world.” Virgin Money was also forced to apologize after its chatbot reprimanded a user for using the word “virgin.” And Cursor, an American tech startup, had to limit the damage when its chatbot informed customers of a radical change to its usage policy that was entirely fictitious.

The paradox of advanced reasoning models

Paradoxically, more advanced reasoning models, which use “chain of thought” approaches to break down complex problems into smaller pieces, appear to hallucinate more often than ordinary LLMs, according to Vectara’s analysis. OpenAI acknowledged in a performance report for its latest reasoning models that o1 hallucinated 16% of the time when synthesizing public information about people, while its newer models o3 and o4-mini hallucinated 33% and 48% of the time, respectively.

Basic mathematics and logical reasoning

Ironically, while companies market these systems as “superintelligences,” LLMs struggle noticeably with tasks any elementary school student could solve. Basic mathematical reasoning remains a weak point, which is problematic when customers ask questions about discounts, warranty dates, or cost calculations.

How can we manage this risk and have complete confidence in our AI-powered tools?

We have identified precautions to take and methods to follow in order to make the most of AI capabilities (for customer services as in all areas that handle critical information) and we will share these elements in the last of the 5 articles we are publishing on this subject.

Stay tuned!