Who is supposed to test Frontier AI (and who, exactly, is doing the testing now)?

Jun 25, 2026 | blog

Cybersecurity does not stand still. Even as teams continue the day-to-day work of managing vulnerabilities and responding to vendor updates, broader shifts in the threat and regulatory landscape can start to take shape in the background.

One of those shifts is now happening around advanced AI systems. What started as standards work and early testing efforts has quickly expanded into a much wider conversation involving federal agencies, lawmakers, industry, and other stakeholders. Questions about how these systems should be evaluated, who is responsible for oversight, and where authority should sit have surfaced. Government testing agreements have been announced, an Executive Order has been signed, Congress has introduced a draft framework, and opposition is already forming across multiple groups.

Together, these developments point to a fast moving and still unsettled approach to how frontier AI will be governed.

May 2026: CAISI Expands Pre-Deployment Testing to Major AI Labs

That broader debate did not begin in Congress or at the White House but started more quietly at NIST. NIST’s Center for AI Standards and Innovation (CAISI) announced voluntary agreements with Google, Microsoft, and xAI that would allow government evaluators to test frontier AI models before deployment. The testing would include cybersecurity-related evaluations and was presented as part of a broader effort to better understand the risks associated with increasingly capable AI systems.

The announcement was also significant because it highlighted CAISI’s increasingly visible role within the emerging AI-governance landscape. CAISI and its predecessor, the U.S. AI Safety Institute, had already been conducting frontier-model evaluations, including assessments conducted under earlier agreements with OpenAI and Anthropic. What changed in May was the expansion of those arrangements to additional major developers. The new agreements arrived amid growing uncertainty about which institutions should oversee advanced AI systems. As subsequent events would show, the focus quickly shifted beyond the evaluations themselves to questions of who should perform them and where authority for AI oversight should reside.

The expansion of CAISI’s testing agreements follows a pattern familiar to anyone who has watched cybersecurity standards evolve. What tends to start as voluntary participation by a few major players can, over time, become the de facto expectation for the broader industry. Whether that happens with frontier AI evaluation remains to be seen, but the May announcement suggests the groundwork is being laid.

May 11: A Missing Webpage Raises Questions

Less than a week later, Reuters reported that the government webpage describing the testing arrangements had been removed. Visitors attempting to access the page were met with a “page not found” message, and neither the Commerce Department nor the White House offered an explanation for the change.

The disappearance of a government webpage would not normally be newsworthy, but what drew attention in this case was the timing: the page vanished almost immediately after an announcement involving some of the largest AI developers in the world, and the lack of any explanation from officials quickly created an information vacuum that invited speculation.

Additional reporting published around the same time described disagreements within the administration regarding AI oversight responsibilities. Each one characterized the situation differently, but many pointed to tensions between agencies and stakeholders with competing views on how advanced AI systems should be governed and who should be responsible for evaluating them.

Whether the removed webpage was connected to those discussions, we do not know. In the days that followed, reporting around the testing arrangements continued to sit alongside wider policy activity, as attention moved toward how frontier AI oversight would be handled at the federal level.

June 2: The Executive Order

The next major development arrived on June 2 when President Trump signed the Executive Order on Promoting Advanced Artificial Intelligence Innovation and Security.

The order kept industry participation largely voluntary and put more weight on national security institutions. That is not especially surprising given that many of the worries around frontier AI systems sit in areas that are already national security territory, including cyber operations, intelligence work, critical infrastructure protection, and broader strategic competition.

Commentary at the time noted that the final order looked softer than some of the drafts reportedly discussed in May. Stronger measures were said to have been on the table, but the details of those internal debates are hard to confirm from outside government.

What the Executive Order did provide was a clearer indication of where the administration currently sees AI oversight fitting within the federal landscape. Rather than creating an entirely new regulatory structure, the order largely relied on existing national security and cybersecurity institutions.

June 4: Congress Enters the Conversation

Two days later, lawmakers released the discussion draft of the Great American AI Act, introducing a noticeably different vision for how AI oversight might develop.

The proposal would establish a statutory framework centered around the Center for AI Standards and Innovation, expand CAISI’s responsibilities, and provide significant funding for AI standards and evaluation activities. The discussion draft also included requirements for certain frontier-model developers to submit risk management plans and report significant safety incidents prior to releasing advanced systems.

The proposal also drew attention from industry observers. Roger Rademacher, who oversees Security R&D Engineering at Foxguard, highlighted it in his June 5 Daily AI Brief newsletter, noting its implications for organizations evaluating AI vendors, particularly around safety reporting, release processes, and documented risk controls. One reason the proposal attracted immediate attention was its treatment of state-level AI legislation. The discussion draft included a three-year pre-emption provision affecting certain state AI development laws, arriving at a time when states were already pursuing their own approaches. Colorado’s AI Act, originally scheduled to take effect on June 30, 2026, is now tied up in litigation after xAI sued to block it and the Justice Department intervened; enforcement has been suspended, even as California and several other states continue to push forward with their own laws. The proposal pulled longstanding arguments back into view about how authority should be divided between federal and state governments, how much weight standards organizations should carry, and how far AI rules should be pulled into a single federal framework.

Put the June 2 Executive Order next to what Congress is proposing, and the differences are clear. The Executive Order leans on the existing machinery: executive branch agencies, national security authorities, the usual channels. The congressional bill takes a more structural approach, building out a formal statutory regime around Commerce, standards development, reporting requirements, and the CAISI center at NIST. Both are aimed at the same underlying problem of how to evaluate and place constraints on powerful AI systems, but they reflect very different views on where oversight should sit. The EO relies primarily on national security institutions, while Congress is looking toward standards bodies, mandated disclosures, and statutory oversight.

June 4–6: Opposition Arrives Quickly

Reaction to the discussion draft was swift.

Labor organizations including the AFL-CIO, the American Federation of Teachers, and the Association of Flight Attendants publicly criticized the proposal’s preemption language, arguing that it would interfere with state efforts already underway. Similar concerns appeared elsewhere, as members of the House Democratic AI Commission reportedly argued that the discussion draft could not serve as the basis for productive dialogue in its current form.

None of this should be read as a sign that federal AI legislation is dead on arrival. Major bills almost always catch flak from multiple sides in the early rounds. What stood out here was how fast the conversation moved past the technical details of what these systems do and turned into a wider fight about who gets to call the shots, who answers to whom, and where the boundaries of authority actually lie.

Those debates are likely to continue regardless of what form future legislation takes. For now, the only safe assumption is that the governance story is still being written.

June 12: A Government Directive Tests the Boundaries of Authority

The debate over who oversees frontier AI shifted from policy proposals to direct action on June 12. Anthropic announced it has received a directive from the US government, citing national security authorities, to suspend all access to its Fable 5 and Mythos 5 models for all users. The order, issued as an export control measure, effectively required Anthropic to disable the models immediately to ensure compliance.

According to a statement from Anthropic, the government did not provide specific details of its national security concern but indicated it had become aware of a potential method of bypassing, or “jailbreaking,” Fable 5. Anthropic said it reviewed a demonstration of the technique and found it identified a small number of previously known, minor vulnerabilities—capabilities it argued were widely available from other publicly available models, including Open AI’s GPT-5.5.

Anthropic publicly disagreed with the action, stating that it did not believe a narrow potential jailbreak should be cause for recalling a commercial model deployed to hundreds of millions of people. The company further warned that if this standard were applied across the industry, it “would essentially halt all new model deployments for all frontier model providers.” While complying with the legal directive, Anthropic characterized the process as lacking transparency and called for a statutory framework that is “transparent, fair, clear, and grounded in technical facts.”

This incident brings several of those tensions into sharp relief. It pits a company’s defense-in-depth safety strategy and pre-deployment testing against an opaque, national-security-based directive from the executive branch. It also highlights the very concerns raised by critics of the current approach: the lack of clear, statutory process for government intervention. For security practitioners evaluating AI vendors, the incident raised practical questions about model availability, reliance on specific AI capabilities, and the potential for similar directives to disrupt operators with little notice.

Trust, Verification, and Visibility

Most of our readers are not building frontier AI models. Even so, the underlying debate should feel familiar. Security teams routinely make decisions based on assessments produced by parties other than the technology developer. Vulnerability disclosures, third-party testing, certifications, vendor advisories, government alerts, and industry standards all exist because organizations need ways to evaluate claims they cannot independently verify.

That is ultimately the question now emerging around frontier AI. Not whether advanced systems should be evaluated, but who should perform those evaluations, what methodologies should be used, and which institutions should be trusted to communicate the results. The debate currently playing out among CAISI, Congress, the White House, states, and industry is, in many ways, a debate about building the trust infrastructure that security programs already rely on every day.

Of course, trust is only useful if organizations can actually find and consume the information being produced. One challenge shared by both traditional cybersecurity and emerging AI governance is simply seeing the right information at the right time.

Security teams already spend considerable time tracking advisories, vulnerability disclosures, standards updates, regulatory developments, threat intelligence, vendor bulletins, and product notices from an ever-growing list of sources. As new AI standards, testing programs, legislative proposals, and governance initiatives appear, that landscape only gets more crowded.

The problem is familiar to anyone responsible for vulnerability management. The difficulty is rarely a lack of information, but rather, finding something authoritative quickly enough to use it. As that landscape becomes more complex, organizations increasingly need centralized, trusted approaches to managing advisories, prioritizing risk, and turning intelligence into action.

Looking Ahead

A month ago, the conversation around frontier AI oversight was still mostly about standards work, policy proposals, and theoretical governance models. That has changed. Since then, we have seen testing agreements announced and publicized, an Executive Order signed, a congressional framework released, opposition rolling in from multiple directions, and—most significantly—a direct government directive that forced a major AI developer to disable a commercial model. The debate is no longer just about who should have authority, but rather who is exercising it, and on what legal basis.

None of this settles who ultimately oversees frontier AI systems in the United States. What is becoming clearer, though, is that the debate is moving from broad principles to the kind of concrete questions that security programs actually run on: testing methodologies, reporting requirements, oversight structures, evaluation criteria, and institutional responsibilities. Those may not grab headlines the way a new model release does, but they are likely to be where AI governance first shows up in the work security teams actually do.

Contact us

Contact our experts. We’ll do our best to get back to you within 24 hours.

Discover our customer stories