Applause Report: AI Growth Outpaces Digital Quality

0
Applause has released its fourth annual State of Digital Quality in Testing AI report, showing that while AI adoption is rapidly increasing across enterprise and consumer markets, the quality of those experiences is struggling to keep up.

Based on a survey of more than 1,000 developers and QA professionals, and over 4,000 consumers, the report found that 55% of organizations have released AI-powered applications and features. However, more than half of AI initiatives still fail to reach full production, often due to integration challenges, cost constraints and quality risks. This tension is also reflected in user sentiment: while 40% say AI tools boost productivity by more than 75%, reported quality issues — including hallucinations, misunderstood prompts and unreliable outputs — are rising after a steady decline in recent years.

As organizations accelerate the adoption of AI testing techniques, evaluation by humans remains the most widely used approach, with 61% of organizations relying on human input to evaluate AI performance. Meanwhile, 33% use LLM-as-judge methods, where multiple models assess AI outputs in parallel to uncover blind spots. Despite this mix of approaches, testing strategies are still struggling to keep pace with the speed and complexity of AI development — leaving critical gaps in how these systems are validated at scale. The disconnect could threaten retention, revenue and reputation for businesses.

“AI development isn’t slowing down, and quality is falling behind,” said Chris Sheehan, EVP of High Tech and AI at Applause. “Teams are pushing AI into production before they’ve figured out how to properly test it. That’s why we’re seeing more failures and more risk reaching users. AI adds speed and scale, but human evaluation is what earns trust — you need both. The companies getting it right combine AI and domain expertise to evaluate and fine-tune their systems, ensuring outputs are more relevant, accurate and inclusive.”

AI moves to production — but many initiatives stall

Scaling AI initiatives, including the two most common — chatbots and customer service tools — remains a challenge. More than half of the respondents said fewer than half of their AI projects make it from proof of concept to full production, citing integration complexity, cost constraints and quality risks. To close the gap, teams are adopting a mix of AI-driven and human-led testing approaches. These include fine-tuning with synthetic (29%) and human-generated data (54%), human-led (39%) and automated (23%) red teaming, as well as AI-first testing agents (30%) and human-in-the-loop monitoring (31%).

Quality issues rise as users embrace AI

Despite strong adoption and generally positive sentiment, users are encountering more issues with AI. 40% of users experienced hallucinations this year, up from 32% in 2025. Additionally, 46% said AI misunderstood their prompts — now the most commonly reported issue — while 41% said responses lacked sufficient detail.

Multimodal AI raises new testing challenges

As AI capabilities expand, user expectations are evolving rapidly. 84% of generative AI users say multimodal functionality — the ability to process and generate text, images, audio and video — is critical. This shift is placing new pressure on QA teams to test across a broader range of outputs and edge cases at enterprise scale.

“Testing AI isn’t just about accuracy — it’s about evaluating complex, multimodal outputs at scale,” said Chris Munroe, VP of AI Programs, Applause. “LLM-as-judge systems are becoming an important part of that process, but they can’t operate in isolation. Without human oversight, you risk reinforcing the same blind spots you’re trying to detect. In addition to human-led evals and fine-tuning, structured red teaming by both domain experts and generalists is essential. So is ensuring evaluation rigor — without it, organizations risk scaling systems they don’t fully understand or control.”

A new testing model is required: AI + human evaluation

The report highlights a fundamental shift: AI is forcing organizations to rethink how quality is defined and validated. Unlike traditional software, AI is probabilistic and non-deterministic, so conventional testing methods alone are no longer sufficient. AI testing tools alone will miss what only humans can catch.

Organizations are increasingly adopting hybrid testing models that combine AI-driven evaluation, automation and human validation to bridge these gaps and help ensure reliability and safety. A key benefit of this approach is the creation of “golden datasets” — reusable, high-quality benchmarks that support ongoing regression testing and continuous improvement.

Human insight remains central to the AI QA process. Nearly half of organizations (46%) reported that human sentiment and usability are the primary factors in determining whether an AI feature is ready for production — far outweighing purely technical benchmarks.

At the same time, organizations are investing in accessibility and inclusive testing practices. Nearly three-quarters of AI developers incorporate crowdtesting for accessibility, alongside automated tools and AI agents. However, gaps remain, with 10% of organizations not testing AI systems for accessibility at all.

This shift reflects a broader reality: as AI systems become more complex and non-deterministic, quality can no longer be validated through automation alone — it requires a combination of AI, automation and real-world human insight.

About the report

The 2026 State of Digital Quality in Testing AI provides guidance on how organizations investing in AI and other technologies can gain the most value, based on in-depth analysis of testing platform data, survey results and interviews with Applause customers and internal experts. The full report is available at: https://stateofdigitalquality.com/

Related News:

SoftServe Report: Agentic AI Set to Transform Software Engineering

2026 CISO Survey: 73% Unprepared for Next Cyber Attack

Share.

About Author

Leigh Porter's first love is to love people. Beginning her career as a neonatal RN was an obvious choice until life threw the curve ball to embark on a new IT endeavor. Pursuing this fresh career was a piece of cake with her resilient and steadfast character. Outside of the office, Leigh also diligently gives much of her time faithfully as a nationally awarded volunteer leader to a very dear to her heart organization.