Applause Annual State of Digital Quality in AI Survey Released

0
Applause unveiled findings from its third annual State of Digital Quality in AI Survey, revealing a major gap between rising investments in generative AI (Gen AI) and the adoption of crucial quality assurance (QA) practices in the software development lifecycle (SDLC). As Gen AI applications and agentic AI—capable of autonomous decision-making and execution—continue to expand globally, rigorous crowdtesting throughout the SDLC is essential to managing growing risks. The survey gathered insights from over 4,400 software developers, QA professionals, and consumers worldwide and examined AI use cases, tools, challenges, and user experiences.

“The results of our annual AI survey underscore the need to raise the bar on how we test and roll out new generative AI models and applications,” said Chris Sheehan, EVP of High Tech & AI, Applause. “Given massive investment in the technology, we’d like to see more developers incorporate AI-powered productivity tools throughout the SDLC, and bolster reliability and safety through rigorous end-to-end testing. Agentic AI is ramping up at a speed and scale we could hardly have imagined, so the risks are now amplified. Our global clients are already ahead of the curve by baking broad AI testing measures into development earlier, from training models with diverse, high-quality datasets to employing testing best practices like red teaming.”

Key findings of the AI Survey:

Embedding AI throughout development delivers powerful competitive advantages, but many organizations are slow to adopt.

  • Over half of the software professionals surveyed believe Gen AI tools improve productivity significantly, with 25% estimating a boost of 25-49% and another 27% seeing increases of 50-74%.
  • Yet, 23% of software professionals say their integrated development environment (IDE) lacks embedded Gen AI tools (e.g., GitHub Copilot, OpenAI Codex), 16% aren’t sure if the tools are integrated with their IDE, and 5% have no IDE.
  • While red teaming, or adversarial testing, is a best practice to help mitigate risks of inaccuracy, bias, toxicity and worse, only 33% of respondents reported using this technique.
  • The top AI testing activities involving humans include prompt and response grading (61%), UX testing (57%) and accessibility testing (54%). Humans are also essential in training industry-specific or niche models; 41% of developers and QA professionals lean on domain experts for AI training.

 

Businesses are investing heavily in AI to enhance customer experiences and reduce operational costs – but flaws are still reaching users.

  • Over 70% of developers and QA professionals who responded said their organization is developing AI applications and features. Chatbots and customer support tools are the top AI-powered solutions being built (55%). And, just over 19% have started to build AI agents.
  • Within the past three months, 65% of users reported that they have encountered problems using Gen AI, including responses that lacked detail (40%), misunderstood prompts (38%), showed bias (35%), contained hallucinations (32%), were clearly incorrect (23%) or included offensive content (17%). Only 6% fewer people experienced hallucinations since last year’s survey.
  • Gen AI users are fickle, as 30% have swapped one service for another, and 34% prefer different Gen AI services for different tasks.

 

Additional insights:

  • Consumer demand for multimodal capabilities has increased.
    78% of consumers say multimodal functionality or the ability to interpret multiple types of media is important to them in a Gen AI tool, compared with 62% last year.
  • GitHub Copilot (37%) and OpenAI Codex (34%) are still the AI-powered coding tools of choice.
    They were the favorites in 2024, too, but the gap between their usage is closing. Last year, GitHub Copilot was preferred by 41% of respondents, and OpenAI Codex by just 24%.
  • QA professionals are turning to AI for basic support of the testing process.
    The top three use cases are test case generation (66%), text generation for test data (59%) and test reporting (58%).

 

Sheehan continued, “Enterprises best positioned to capture value with customer-facing generative AI applications understand the important role human intelligence can play. While every generative AI use case requires a custom approach to quality, human intelligence can be applied to many parts of the development process including model data, model evaluation and comprehensive testing in the real world. As AI seeps into every part of our existence, we need to ensure these solutions provide the exceptional experiences users demand while mitigating the risks that are inherent to the technology.”

The AI Survey is part of the State of Digital Quality content series from Applause. The annual State of Digital Quality Report draws on Applause’s experience serving global enterprises and technology leaders for more than 15 years, including many AI innovators. Based on in-depth analysis of testing platform data, survey results and interviews with customers and internal experts, the report provides guidance on how organizations investing in AI and other technologies can gain the most value. To review the Applause Annual State of Digital Quality in AI survey, visit the website here.

Share.

About Author

Taylor Graham, marketing grad with an inner nature to be a perpetual researchist, currently all things IT. Personally and professionally, Taylor is one to know with her tenacity and encouraging spirit. When not working you can find her spending time with friends and family.