This week, two stories belong in the same sentence. Six Western intelligence agencies published their first joint warning on agentic AI: agents already deployed in critical infrastructure are over-privileged, under-monitored, and actively being targeted. The same week, an independent government body confirmed that GPT-5.5 completes a 32-step corporate network attack simulation in 2 of 10 attempts, with a 20-hour human expert baseline for the same scenario. One story describes the exposure; the other measures the capability.
The Big Stories
Six Intelligence Agencies Issue First Joint Warning on Agentic AI Deployments
On May 1, CISA, NSA, and partner agencies from the UK, Australia, Canada, and New Zealand jointly published Careful Adoption of Agentic AI Services, the first multi-government guidance document specifically targeting agentic AI systems. The Register covered the advisory in detail. The 30-page document identifies 23 distinct risks across five categories: privilege (agents with excessive access), design flaws, behavioral risks including goal misalignment, structural risks from interconnected agent components, and accountability failures from opaque audit logs. The central directive: prioritize resilience, reversibility, and risk containment over efficiency gains. The guidance specifically flags prompt injection as an active attack vector, describing scenarios where a compromised low-risk tool inherits an agent’s elevated credentials, modifies contracts, approves unauthorized payments, and fakes the audit trail.
Why it matters: Intelligence agencies publish joint advisories when they have visibility into active threat behavior, not theoretical scenarios. The five risk categories aren’t speculative; they map to attack patterns already observed on real infrastructure. Any agentic system targeting enterprise or government deployment now faces this advisory as a hard procurement gate in regulated sectors. The “prioritize resilience over efficiency” framing is worth holding onto: it’s the first official articulation of what responsible agentic deployment should optimize for, and it directly contradicts the default product-side framing of “automate everything faster.”
GPT-5.5 and Mythos Both Clear a 32-Step Corporate Network Attack Simulation
The UK’s AI Security Institute published its evaluation of OpenAI’s GPT-5.5 for offensive cyber capabilities. GPT-5.5 scored 71.4% on expert-level capture-the-flag tasks (versus Claude Mythos Preview’s 68.6%), and completed “The Last Ones” in 2 of 10 attempts. The Last Ones is a 32-step corporate network attack chain spanning four subnets: reconnaissance, credential theft, lateral movement across Active Directory forests, a CI/CD supply chain pivot, and database exfiltration. AISI estimates a human expert needs roughly 20 hours for the same scenario. Both labs are now restricting access to these specialized models: Anthropic capped Mythos Preview to around 50 organizations after White House objections; OpenAI is doing the same with GPT-5.5 Cyber.
Why it matters: AISI’s explicit conclusion is that this isn’t a single-model breakthrough; it’s a frontier-wide capability shift. A second major lab, evaluated by an independent government body, reached similar offensive cyber performance. Two data points from different organizations confirming the same capability threshold is how you know a trend is real. If you’re building AI tooling that touches security-adjacent workflows, this is the baseline you’re operating against now, not some future risk horizon.
Sierra Raises $950M at $15.8B as Enterprise AI Moves from Experiment to Infrastructure
Sierra raised $950M in a Series E led by Tiger Global and GV, reaching a $15.8B valuation. The company, co-founded by Bret Taylor and Clay Bavor, builds enterprise AI agents for customer experience: mortgage refinancing, insurance claims, returns processing, nonprofit fundraising. It hit $150M ARR in eight quarters and claims over 40% of the Fortune 50 as customers, including Prudential, Cigna, and Blue Cross Blue Shield. Separately, Anthropic is finalizing a round reported at $50B at a $900B+ valuation, which would more than double its February valuation of $380B and make it the world’s most valuable AI startup. Anthropic confirmed its annual revenue run rate has surpassed $30B.
Why it matters: Sierra’s metrics are the more important signal. Reaching $150M ARR in eight quarters is fast for any software company; doing it in customer service, where deployment is least experimental and most directly measurable against headcount costs, suggests real enterprise adoption rather than pilot-stage enthusiasm. Bret Taylor’s estimate of a $400B addressable market in customer service isn’t speculative; it’s the existing annual spend that AI agents are starting to displace. The Anthropic round, if it closes near $900B, changes how enterprise buyers should frame vendor longevity: treat Claude-family models as institutional infrastructure, not point tools.
Under the Radar
[Expert-first] The Research Describing Why Long-Running AI Agents Drift and Fail
A January 2026 preprint, AT²PO: Agentic Turn-based Policy Optimization via Tree Search, identifies three structural failure modes in multi-turn agentic systems trained with reinforcement learning. First, exploration diversity collapses over extended task chains, causing agents to repeat patterns rather than try genuinely different approaches. Second, sparse rewards at task completion can’t attribute which specific turn caused a success or failure, making training unstable. Third, standard RL optimization at the token level doesn’t align with how agentic tasks actually work, which is turn-by-turn decisions over long horizons. The paper’s fixes: entropy-guided tree expansion to force diverse exploration, turn-wise credit assignment to propagate rewards back through the task chain, and a turn-level policy algorithm that matches the optimization target to the task structure. Tested across seven benchmarks, it outperforms prior methods by up to 1.84 percentage points. No mainstream AI media coverage.
Why you should care: If you’ve watched a well-prompted agent degrade or drift into repetitive behavior 20+ turns into a long task, this is the architectural explanation. It’s usually diagnosed as a context window problem or a prompt quality problem. It’s often neither. It’s RL training instability from misaligned optimization. AT²PO isn’t in any production framework yet, but reading the abstract before your next long-horizon agent debugging session will change what you look for.
[No mainstream coverage] NHS England Named Anthropic’s Mythos in a Security Policy Document
NHS England ordered technology leaders to make hundreds of public GitHub repositories private by May 11, citing rapid advancements in AI models capable of large-scale code ingestion, and explicitly naming Anthropic’s Mythos as the example capability driving the decision. Most of the affected repos, by NHS’s own sources, contain documentation, architecture diagrams, and internal tooling code with minimal sensitive content. An open letter signed by 74 critics called the decision security theater; NHS’s former head of open technology called it ineffective since the code was already ingested for training years ago. Teams needing an exception must apply by May 6; all repos go private May 11.
Why you should care: The policy’s technical merit is genuinely debatable. But that’s not the signal. This is the first time a major public institution has cited a specific frontier AI model by name inside a security compliance document. Healthcare moves slowly. When NHS writes Mythos into a security policy, similar language is forming right now in the security reviews of other regulated-sector IT teams. If any of your clients operate in healthcare, finance, or critical infrastructure, this framing is what their security leadership is reading and templating from.
Quick Hits
Musk admits xAI distilled OpenAI models under oath - In week 1 of the Musk v. Altman trial, Musk confirmed on the stand that xAI used OpenAI model outputs to train Grok through distillation, while simultaneously suing OpenAI for commercial mission drift. MIT Technology Review
Pentagon clears 8 AI firms for classified military networks - SpaceX, OpenAI, Google, NVIDIA, Microsoft, AWS, Oracle, and Reflection authorized for Impact Level 6/7 deployment. Anthropic excluded after refusing autonomous weapons use. TechCrunch
Nature retracts influential ChatGPT-in-education meta-analysis - A meta-analysis synthesizing 51 studies on ChatGPT’s impact on student learning, retracted over discrepancies in the analysis that undermined confidence in its conclusions. 404 Media
Minnesota becomes first state to ban AI nudification apps - Fines up to $500K per violation; the Senate passed 65-0, the House passed the previous week, and the bill now awaits the governor’s signature. Truthout
GitHub Copilot deprecating GPT-5.2 and GPT-5.2-Codex on June 1 - GPT-5.5 replaces GPT-5.2; GPT-5.3-Codex replaces GPT-5.2-Codex. Removed from all Copilot experiences except Code Review. Admins need to update model policies before the cutover. GitHub Changelog
What to Watch
Cross-lab AI capability evaluations as an emerging transparency norm. AISI evaluated both Anthropic’s Mythos Preview and OpenAI’s GPT-5.5 with published methodology, independent of both companies’ self-reporting. Both labs accepted the external assessment. If this becomes a regular practice across frontier labs, it creates a publicly auditable record of capability milestones that doesn’t depend on company press releases. Watch whether the EU’s AI safety institutes, or other national bodies, establish similar bilateral evaluation arrangements in the next two quarters. The precedent is set; the question is whether it generalizes.
If someone forwarded this to you, subscribe here to get it weekly.