In 2025, nearly 3 in 10 security professionals thought that fully autonomous AI systems could satisfy their companies’ security-testing needs. But after a year of testing and experimentation, that optimism has largely gone away.
Instead, chief information security officers (CISOs) and other security practitioners have more realistic expectations of the AI-based systems, which often have significant blind spots, are prone to false positives, and can blow through AI budgets, according to a June 25 report released by Cobalt, a penetration-testing-as-a-service firm. The number of organizations willing to rely on AI-powered penetration testing for their security needs fell to 9% in 2026, down from 29% a year earlier. The vast majority of companies preferred a hybrid, human-in-the-loop approach or relegating only non-critical tasks to automation.
Security practitioners are experimenting to find the sweet spot of what can be automated reliably and responsibly, says Gunter Ollmann, chief technology officer for Cobalt.
“CISOs in particular have been, for at least the last two years, under immense pressure by their leadership team, by their board, to use more AI, and autonomous pentesting fits that bill,” he says. “Many of them now have a year under their belt of rolling out AI systems, as well as experimenting with AI pen testing tools, and generally … their confidence in the security and the efficacy of these tools has dropped.”
Whether LLMs and AI systems will solve security problems or just present more challenges in the near and medium term is still a major question mark for security practitioners. Vulnerabilities are being reported at a 46% higher rate than forecasted from last year’s data, according to an analysis from the Forum of Incident Response and Security Teams (FIRST). In another example of the challenges, Microsoft patched 206 unique CVEs in its June 2026 Patch Tuesday updates, a record driven by AI discovery of flaws.
Human verification of AI-discovered flaws will be the bottleneck in the future, FIRST analysts Jerry Gamblin and Eireann Leverett wrote in their analysis.
“In an era where AI can find significantly more flaws than human analysts, the constraint is no longer discovery; it is the human capacity to verify, coordinate, and patch,” they wrote. “We also believe a crucial bottleneck will be in writing detection signatures for exploitation. The issue often comes down to the difference between identification and true risk detection.”
CISOs Face Massive Increase in Vulnerabilities
A fundamental problem for organizations is that AI-augmented programmers are producing more code, and that translates to more vulnerabilities, even if the code is somewhat higher in quality. Meanwhile, security practitioners are focused on increasing the number of security assessments, with 77% committed to regular security assessments and pen testing, according to Cobalt’s report.
While that will require more automation, AI systems and large language models (LLMs) have shown weaknesses. Even with finding more vulnerabilities, AI systems are still missing high- and critical-severity issues, for example. Three-quarters of companies (78%) have had automated systems miss significant vulnerabilities — also known as “false negatives,” dampening security professionals’ enthusiasm for full automation, according to Cobalt’s report.

The amount of data produced by AI security assessment also makes it difficult for humans to keep up, requiring more human oversight and better systems, says Derek Rush, managing senior consultant at Bishop Fox, an offensive security services firm.
“These systems generate an enormous volume of data, and it takes an experienced mind to shape the context the LLM produces,” he says. “A human expert is needed to decide whether a lead is worth pursuing, and if it is, to work out what the full, validated attack chain looks like. That judgment is exactly what gets missed when you take the human out, which is why buyers are running into the gaps.”
HackerOne paused its Internet Bug Bounty program because of the growing volume of submissions that needed validation. AI systems could close the gap, but false negatives and false positives continue to be problems, says Sandeep Singh, vice president of product strategy at HackerOne.
“False positives are the more familiar problem — they could be noisy and expensive in triage and validation time,” he says. “This is where the need for stronger validation becomes more important. However, AI approaches could be used for validation as well so that it doesn’t become a bottleneck for humans.”
AI Security-Assessment Tools Will Get Better
The lesson for CISOs: Don’t expect AI penetration-testing and vulnerability-finding tools will replace human penetration testers anytime soon. In the short term, experts are required to get the most out of these systems, but AI systems will become more capable, says HackerOne’s Singh.
“We’d call it a bump [in the road] because the market briefly conflated ‘AI can assist and amplify pentesting’ with ‘AI can replace the pentester,’ and is now correcting,” he says. “The durable model is agent-augmented human testing, meaning the agent does the relentless breadth continuously and first pass, and the human does the depth and judgment periodically.”
However, improvements in AI models will likely continue to emerge quickly, and “long term the trajectory is toward more autonomy,” he says.
For CISOs, that means that whether to use automation will boil down to the return on investment. Unfortunately, the costs of AI-powered penetration testing services are difficult to predict, and with examples of other business processes running up AI-service fees, security practitioners have become a bit gun-shy, says Cobalt’s Ollmann.
“The cost piece is a major concern,” Ollmann says, but he sees a light — albeit, distant — at the end of the tunnel. “As a vendor in this space, we all expect … AI is going to become cheaper and overall the net is we win.”