Claude Code leaked its own source; Anthropic has a held-back model called Mythos

Anthropic had two security incidents in five days. A draft blog post leaked the existence of an unreleased model called Mythos, described as “a step change” over Opus 4.6 with major cybersecurity capability gains. Then the entire Claude Code source map shipped on npm. Anthropic has confirmed Mythos and said it is “being deliberate about how we release it,” making it one of the first publicly documented cases of a frontier model deliberately withheld on safety grounds rather than readiness. Meanwhile, Google released Gemma 4 under Apache 2.0, and the top of the open model tier now looks genuinely crowded.

The Big Stories

Google Releases Gemma 4 Under Apache 2.0; Its 31B Model Ties the World’s Best Open Models

Google dropped Gemma 4 this week: four vision-capable open models across multiple sizes, now under the Apache 2.0 license instead of the custom Gemma license that frustrated developers and blocked enterprise adoption of prior versions. The 31B dense variant ties Kimi K2.5 (a 744B-parameter MoE) and GLM-5 (a 1T-parameter MoE) on open model rankings, which means Google’s 31 billion parameters are competitive with models that are 20 to 30 times larger. NVIDIA is providing on-device inference support for edge deployment. (Google DeepMind Blog, Ars Technica, Interconnects)

Why it matters: The license change is arguably more significant than the benchmark numbers. Gemma 3 was good; enterprises still avoided it because of compliance overhead. Apache 2.0 removes that friction entirely. For teams running local inference, Gemma 4’s 26B MoE and 31B dense variants are now the strongest arguments for self-hosted deployments. Nathan Lambert at Interconnects points out that the open model space is genuinely crowded at the top now, with Qwen 3.5 (27 million downloads) and Kimi K2.5 both world-class, but Gemma 4’s efficiency-to-quality ratio and license terms give it a real enterprise edge.

Claude Code Source Leaks; Anthropic’s ‘Mythos’ Frontier Model Confirmed via Earlier Leak

Anthropic had two security incidents in five days. First, on March 26, Fortune reported that Anthropic had inadvertently left close to 3,000 files in a publicly searchable data store, including a draft blog post describing an unreleased model called Mythos (internally also referred to as Capybara). The draft described it as “by far the most powerful AI model we’ve ever developed,” more capable than Opus 4.6 across coding, academic reasoning, and cybersecurity. Anthropic confirmed the model exists and said the company is “being deliberate about how we release it.” Then on March 31, the official npm package for Claude Code (@anthropic-ai/claude-code v2.1.88) shipped with an exposed source map file: roughly 57MB, mapping 512,000 lines of code across 1,900 files. The full codebase was publicly readable before Anthropic pulled it. Code analysis surfaced a disabled feature roadmap of unshipped capabilities and corroborated the Capybara/Mythos tier from the prior leak, with internal notes describing “a step change in cyber capabilities.” Anthropic’s DMCA takedown effort to remove the leaked code also accidentally removed legitimate public forks of an unrelated open-source repository before being reversed. The same week, AMD’s director of AI Stella Laurenzo publicly filed a GitHub issue stating Claude Code “cannot be trusted to perform complex engineering tasks,” citing analysis of 6,852 sessions where the Read:Edit ratio dropped from 6.6 to 2.0 since February. (Fortune (Mythos leak), Ars Technica (source leak), Ars Technica (analysis), Zvi Mowshowitz, AMD GitHub issue, The Register)

Why it matters: Three things are worth tracking separately. First, Mythos is confirmed: Anthropic publicly acknowledged the model and described it as a “step change” over Opus 4.6, with notable advances in cybersecurity. Anthropic also disclosed that a Chinese state-sponsored group ran a coordinated campaign using Claude Code to infiltrate roughly 30 organizations before being detected, which is exactly the dual-use evidence that justifies holding capability back. Mythos is one of the first publicly documented cases of a frontier model withheld on safety grounds rather than readiness. Second, the AMD complaint has enterprise credibility; Laurenzo’s analysis of 6,852 sessions and 234,760 tool calls is a different category of criticism than anonymous user complaints. Third, the source-map vector is a real security pattern. If any of your projects publish compiled JavaScript with source maps to npm, you may be exposing more than you realize.

Anthropic Cuts OpenClaw Off from Claude Subscriptions

Effective April 4, Anthropic stopped allowing Claude subscription limits to apply to OpenClaw and other third-party tool usage. Users who ran heavy agentic workflows through OpenClaw on a Max subscription are now looking at usage-based billing that can run 50 times higher per month. Anthropic’s stated reason: third-party harnesses bypass the prompt cache optimizations built into Claude Code, consuming significantly more compute per session. Anthropic also promised one-time credits equal to the monthly subscription price to ease the transition. A separate Ars Technica piece this week covers a recently patched security vulnerability in OpenClaw, documenting an exploit path that takes advantage of the tool’s full system access model. OpenClaw’s creator Peter Steinberger, who joined OpenAI in February, called the timing suspicious: Anthropic “copied some popular features into their closed harness, then locked out open source.” Steinberger and OpenClaw board member Dave Morin tried to talk Anthropic out of the change and only managed to delay it by a week. (TechCrunch, The Register, Ars Technica (security))

Why it matters: If you use OpenClaw, budget for API pricing rather than subscription inclusion from here on out. The security piece is worth reading before any new deployment: OpenClaw’s full computer-control model requires the same trust calibration as giving a contractor root access. The recently patched vulnerability is documented there with an exploit path. Make sure you’re on the patched version before deploying.

Under the Radar

[Expert-first] An AI Safety Researcher Just Doubled Their Probability Estimate for Full AI R&D Automation by 2028

A post on the AI Alignment Forum documents a significant personal forecast revision: the author moved to “a bit below 30% probability of full AI R&D automation by end of 2028.” That’s roughly double their prior estimate. The trigger isn’t a new benchmark. It’s a qualitative observation: AI systems can now “often do massive easy-to-verify software engineering tasks,” and the author is extrapolating from that capability shift. No mainstream news has picked this up. (AI Alignment Forum)

Why you should care: The safety and alignment research community is consistently ahead of the mainstream news cycle on capability assessments. When researchers start revising near-term R&D automation probabilities upward, and grounding those revisions in capability observations rather than hype, it tends to show up in broader discourse a few weeks later. A 30% probability estimate for full AI R&D automation within two years is not a fringe view if it spreads; it will reshape investment timelines, hiring decisions, and safety conversations considerably. Track follow-up posts from other alignment researchers to see if this estimate is converging.

[Expert-first] The Safety Technique Everyone Relies On May Break at Larger Model Scales

A LessWrong post published this week argues that single-vector activation steering (one of the primary mechanistic interpretability tools used to influence and constrain model behavior) is likely to become unreliable as models scale up. The author shows that steering already degrades model output coherence even when it gets the right answers, and that this degradation pattern suggests the technique won’t hold at larger scales. More sophisticated alternatives exist (SAE feature ablation, causal crosscoders) but they’re less tested and harder to deploy. No AI media has covered this. (LessWrong)

Why you should care: Activation steering is frequently cited as a concrete safety mitigation available now, before more comprehensive alignment solutions exist. If it doesn’t scale, a key item in the current safety toolbox needs to be replaced. This finding changes which safety techniques deserve investment and which are false confidence. It’s the kind of thing that matters a lot when people are arguing that we have “sufficient” safety tools to proceed with more capable systems.

Quick Hits

GitHub is on pace for 14 billion commits this year - GitHub COO Kyle Daigle reported 275 million commits per week; the platform saw 1 billion total commits in all of 2025. AI-assisted coding is measurably changing production volume at scale. Simon Willison
Karpathy drops LLM Wiki concept - A GitHub Gist describes a pattern where an LLM compiles raw documents into a living, interlinked markdown wiki that grows with every new source and every query, replacing static RAG for personal knowledge bases. Implementations are already appearing in the community. GitHub Gist
OpenAI closes $122B round at $24B ARR - The largest private funding round in history, at an $852B post-money valuation. OpenAI raised $3B from individual investors via bank channels for the first time, an unusual move that brings retail exposure ahead of a formal listing. OpenAI Blog, TechCrunch
Iran threatens OpenAI’s Stargate Abu Dhabi facility - The IRGC published a video explicitly targeting the data center under construction in the UAE if the US attacks Iranian power infrastructure. This follows a March 1 Iranian Shahed drone strike on AWS data centers in the UAE that took two ME-CENTRAL-1 availability zones offline for over 24 hours, so the threat pattern is established, not hypothetical. Geopolitical risk is now concrete and named for OpenAI’s Gulf compute expansion. The Verge
Medvi: $1.8B in telehealth revenue, built on AI deepfake ads - Gary Marcus documents the telehealth startup’s use of AI-generated patient photos, deepfaked before-and-after images, and fake doctor headshots. The FDA had already issued a warning letter. The New York Times profiled the company positively before the practices became public. Marcus on AI
axios npm package backdoored via maintainer social engineering - On March 31, attackers tied to North Korea’s Sapphire Sleet (UNC1069) published two trojanized versions of axios (1.14.1 and 0.30.4) for roughly three hours, delivering a cross-platform RAT to anyone who pulled them. The vector was a fake company impersonation, fake Slack workspace, and fake Microsoft Teams meeting that compromised the maintainer’s credentials. Different threat actor from the TeamPCP campaign that hit LiteLLM, same playbook. axios postmortem, Microsoft Security
OpenAI, Anthropic, and Google team up against Chinese model distillation - The three labs are sharing information through the Frontier Model Forum to detect adversarial distillation by Chinese competitors. Bloomberg reports they have already flagged DeepSeek, Moonshot, and MiniMax as engaging in illicit capability extraction. Bloomberg

What to Watch

The LLM Wiki pattern. Karpathy’s GitHub Gist describes something more interesting than another RAG alternative. The core idea: an LLM acting as a compiler reads raw sources and writes a structured, interlinked wiki that grows with every query and every new document ingested. Knowledge compounds instead of being rediscovered on each retrieval call. Implementations are already circulating on GitHub. This approach is directly applicable to any team managing a large document corpus, internal knowledge base, or research archive. Watch for production case studies in the next few weeks; if they hold up, it’s a meaningful shift in how to think about LLM memory architecture.

If someone forwarded this to you, subscribe here to get it weekly.