OpenAI kills Sora at $1M/day; rotate credentials if you use LiteLLM

OpenAI shut down Sora this week because the economics stopped working: $1 million a day in compute costs, under 500,000 active users, and a $1 billion Disney deal that died before it closed. The same week, attackers used an AI agent to automate a supply chain compromise that touched 36% of cloud environments. Both stories say something about where AI actually stands right now. The technology works; the cost curves and threat models are still catching up.

The Big Stories

OpenAI Shuts Down Sora as Video AI Economics Break Down

OpenAI announced on March 24 that Sora is shutting down in two stages: the consumer app closes April 26, the API shuts down September 24. After a splashy launch, user count peaked at roughly 1 million and dropped below 500,000, while daily compute costs held at around $1 million. The $1 billion Disney content licensing deal, often cited as proof of enterprise interest, never closed and is now officially dead. ByteDance had been preparing a global launch of Seedance 2.0; after the announcement, it quietly delayed. The Sora research team is being redirected to “world simulation research” for robotics, so the underlying capability isn’t being abandoned, just defunded as a consumer product. (TechCrunch, TechCrunch analysis)

Why it matters: This is a cost-curve story, not a technology failure. Video generation works. What doesn’t work yet is running it at consumer scale without burning through GPU budget faster than revenue arrives. The AI video market attracted over $25 billion in VC investment on the assumption that Sora proved demand. What it actually proved was that demand exists at a price point current infrastructure can’t support profitably. The window for second movers is open, but only for teams that can wait out a 10x inference cost reduction before launching.

LiteLLM Was Backdoored; If It Touched Your Environment, Rotate Everything

On March 24, threat actor TeamPCP published two compromised versions of LiteLLM to PyPI (1.82.7 and 1.82.8) after stealing PyPI credentials through LiteLLM’s CI pipeline. The attack chain ran through a previously compromised Trivy security scanner, then Checkmarx’s CI actions, then LiteLLM’s build system. The malicious payload harvested SSH keys, cloud credentials, Kubernetes secrets, and .env files. On machines running Kubernetes, it also attempted lateral movement and installed a persistent systemd backdoor. LiteLLM downloads 3.4 million times per day and is present in 36% of cloud environments as a transitive dependency; affected downstream packages include DSPy, MLflow, OpenHands, CrewAI, and Arize Phoenix. The malicious versions were live for about three hours. Detection was accidental. CVE-2026-33634, CVSS 9.4. (LiteLLM Security Advisory, BleepingComputer, Wiz Blog)

Why it matters: Run pip show litellm | grep Version in any environment that saw package updates on March 24. If you see 1.82.7 or 1.82.8, uninstall immediately, rotate every credential accessible from that environment, and check Kubernetes for unauthorized pods in the kube-system namespace. The safe version is 1.82.6. This is part of an ongoing coordinated campaign; the Telnyx package was hit three days later using the same playbook. Standard security tooling missed this one. That’s the part that should keep teams up at night.

Claude Code Now Controls Your Desktop, Not Just Your Terminal

Anthropic shipped three substantial upgrades to Claude Code on March 24. Full computer use arrived: desktop mouse and keyboard control, not just terminal and code execution. Auto mode followed, replacing constant approval prompts with model-based classifiers that let the agent operate without interrupts for longer stretches. And mobile dispatch landed, letting you trigger and supervise Claude Code agents from Telegram, Discord, or iOS. Anthropic also opened a plugin architecture that includes OpenAI Codex integration. OpenAI responded within days with competing Codex feature announcements. (Anthropic Blog, Auto Mode Technical Post)

Why it matters: The “coding agent” framing is giving way to something broader. With full computer use, Claude Code can now interact with Salesforce, Excel, legacy applications, and any UI that doesn’t have an API. Enterprise automation tasks that were previously out of scope are now tractable. The product race between Anthropic and OpenAI on agentic tools is no longer slow-moving; major capabilities are shipping week to week. The evaluation gap for teams trying to keep up is widening faster than most organizations can absorb.

Under the Radar

[Expert-first] KV Cache Quantization Reverses at Large Contexts

Practitioners on r/LocalLLaMA ran systematic benchmarks of KV cache quantization on a DGX Spark (GB10, 128GB unified memory) running Nemotron 3 Nano 30B at 128K context. At 64K context, q4_0 collapsed prompt throughput from 282.7 tokens/second (f16) to 21.3 tokens/second, a 92.5% slowdown. The q4_0 configuration also used more memory than f16 at those context lengths; dequantization overhead exceeds the savings from lower precision when the KV cache grows large. q8_0 degraded less severely but also showed meaningful regression. No lab has published on this; no mainstream publication has covered it. (NVIDIA Developer Forums)

Why you should care: The default recommendation for inference optimization is to quantize your KV cache. That advice holds at short contexts on traditional GPU architectures. On hardware with fast unified memory, specifically Apple Silicon and Nvidia GB series chips, the math inverts past roughly 32K context. If you’re deploying long-context inference pipelines and relying on q4_0 to stay within memory budgets, don’t trust that assumption. Benchmark on your actual hardware before production. This finding exists only in community benchmarks right now; official documentation will catch up, but not before teams have already shipped configurations based on the wrong defaults.

[Expert-first] An AI Agent Automated Part of the LiteLLM Supply Chain Attack

Security post-mortems on the TeamPCP campaign documented that the attackers deployed an AI agent called “openclaw” as part of their operational pipeline. It’s the first confirmed case of an AI agent used operationally in a software supply chain attack, deployed not for initial vulnerability research but for automating execution steps at scale. Multiple security firms documented the finding. No AI-focused outlet has covered the openclaw detail specifically; it got buried under the larger breach story. (Datadog Security Labs, ReversingLabs)

Why you should care: The same agent capabilities that make developer tools useful are now being used to attack developer infrastructure. The full scope of what openclaw automated isn’t publicly documented yet, but its presence in a live campaign means manual coordination steps in supply chain attacks can be offloaded to an agent. This happened two weeks ago in a real campaign that hit 36% of cloud environments. If your threat model for agentic systems didn’t include “used by attackers,” update it now.

Quick Hits

Judge blocks Pentagon’s Anthropic ban - Federal Judge Rita Lin granted a preliminary injunction citing First Amendment retaliation; the Pentagon’s CTO responded that the designation is still in effect under a separate statute. CNBC
Mistral raises €830M in debt - Seven banks financed a Paris-area data center with 13,800 Nvidia GB300 GPUs; operations start by end of June. European AI compute sovereignty has entered the debt market. TechCrunch
JetBrains retires Code With Me, launches Central - Pair programming plugin removed in 2026.1; replacement is Central, an agentic development platform with agent governance and shared context across repos. Early access in Q2 2026. The Register
Qwen3.5-Omni released - Alibaba’s multimodal model supports 256K context, 113 languages for speech recognition, and audio-visual coding from screen recordings without text prompts. 215 SOTA claims from Alibaba’s own report; treat the numbers as unverified until independent evals arrive. MarkTechPost
Shield AI raises $2B at $12.7B valuation - Defense drone and autonomous aircraft company; doubling of valuation in 12 months driven partly by U.S. Air Force CCA contract. Agile Robots also deploying DeepMind foundation models on humanoids in a confirmed production pilot. Robotics capital is now serious. The Robot Report
Documentation poisoning demonstrated at scale - Context Hub proof-of-concept: an attacker submits a pull request with malicious API docs, AI coding agents ingest the instructions as authoritative, and compromised code or stolen credentials result. No executable required. The Register
Apple ML Research: tool use extends context past training limits - “Tool-Use Unlocks Length Generalization in State Space Models” shows that giving state space models tools lets them generalize to longer contexts than their training window included. Directly relevant to agent system design. Apple ML Research

What to Watch

MCP standardization - Model Context Protocol is emerging as the default interface for CLI-to-agent tool integration across Claude Code plugins, OpenAI Codex, and independent tool builders. If it becomes the standard for agent-to-tool communication the way REST became the standard for web APIs, the abstraction layer matters as much as the underlying model. Watch for: cross-vendor adoption announcements, enterprise tooling built on MCP rather than custom integrations, and any major vendor explicitly publishing compatibility or incompatibility. The adoption velocity over the next 60 days will tell you whether this is a real standard or just Anthropic’s preferred format.

If someone forwarded this to you, subscribe here to get it weekly.