The Agent Stack

Anthropic's Fable 5 changes the game for agent builders

The Agent Stack #044 — Wednesday Stack Anthropic just dropped Claude Fable 5, and this isn’t another incremental model update. This is the first public Mythos-class model that actually works for building production agents. I’ve been testing Fable 5 against Claude 3.5 Sonnet for the past 48 hours across three different agent workflows. The results are striking. Fable 5 consistently handles multi-step reasoning tasks that would trip up previous models, particularly when dealing with ambiguous instructions or error recovery. ...

Building Prompt Injection Defense with Lockdown Mode

The Agent Stack #043 — Monday Build OpenAI’s new Lockdown Mode landed Saturday with zero fanfare and maximum impact. This isn’t another GPT wrapper announcement. It’s the first practical defence against prompt injection attacks that actually works in production. What Lockdown Mode Actually Does Prompt injection is the SQL injection of AI. Users slip malicious instructions into legitimate prompts to extract training data, bypass safety filters, or exfiltrate sensitive context. Standard defences like input sanitisation fail because LLMs are designed to follow instructions embedded anywhere in text. ...

Apple approves first AI agent for Messages for Business

The Agent Stack #042 — Friday Signal Apple just cracked open its walled garden for AI agents. For the first time ever. The Big Move Apple approved Poke as the first AI agent on its Messages for Business platform. This matters because Apple has been notoriously restrictive about AI integration across its ecosystem. Poke lets users interact with AI agents through simple text messages. Think of it as ChatGPT, but instead of opening an app, you just text a business number. The AI handles customer service, bookings, orders - whatever that business needs. ...

Microsoft's ASSERT testing framework is production-ready

The Agent Stack #041 — Wednesday Stack Microsoft just dropped something that every agent builder needs: a testing framework that doesn’t suck. ASSERT (Adaptive Spec-driven Scoring for Evaluation and Regression Testing) lets you write AI behaviour tests in plain English, then automatically generates the evaluation logic. I’ve been running ASSERT against our customer service agents for two weeks. The results are impressive, but there are sharp edges you need to know about. ...

# NVIDIA Cosmos 3 Drops: Build Physical AI Agents That Actually Work

The Agent Stack #040 — Monday Build NVIDIA just released Cosmos 3, and it’s the first open model that can reason about physics and plan actions in the real world. This isn’t another chatbot that hallucinates physics. It’s a foundation model trained on millions of videos that understands how objects move, collide, and interact. Why This Changes Everything for Builders Most AI agents live in pure text or image land. They can write code and analyse data, but ask them to help a robot stack boxes or navigate a room? Disaster. Cosmos 3 bridges that gap by understanding physical cause and effect. ...

Cloud giants rebuild the internet for agents

The Agent Stack #039 — Friday Signal AWS, Cloudflare, and Google are quietly redesigning cloud infrastructure for a future where machines generate most internet traffic. This isn’t about serving humans anymore. The shift is already happening. Cloudflare launched a unified data platform this week specifically designed for AI agents to query and act on real-time data. AWS rolled out new edge computing services optimised for agent-to-agent communication. The old request-response web model breaks down when you have thousands of agents making millions of API calls per second. ...

BadHost vulnerability exposes agent infrastructure gaps

The Agent Stack #038 — Wednesday Stack The security incident everyone’s been waiting for just happened. A critical vulnerability called “BadHost” was discovered in Starlette, the Python web framework that powers millions of AI agents through FastAPI. The BadHost Reality Check Starlette processes 325 million weekly downloads. That’s not a typo. When security researchers found CVE-2026-37284 (the formal designation for BadHost), they effectively identified a pathway into the majority of production agent deployments. ...

Nvidia's diffusion models hit production speed

The Agent Stack #037 — Monday Build Nvidia just dropped something that changes how we think about LLM inference. Their Nemotron-Labs diffusion language models generate text at what they’re calling “speed-of-light” performance. This isn’t marketing fluff. Traditional autoregressive models generate one token at a time. Diffusion models generate entire sequences in parallel. Think going from dial-up to fibre, but for text generation. Building with Diffusion Language Models The core insight is architectural. Instead of predicting the next token given all previous tokens, diffusion models start with noise and iteratively refine it into coherent text. This parallelisation is why they’re fast. ...

Spotify's AI assault signals platform shift

The Agent Stack #036 — Friday Signal Spotify just launched four separate AI products in one day. This isn’t feature creep—it’s a complete platform transformation that shows where consumer AI is heading. The Swedish streaming giant unveiled AI-powered remixes (with Universal Music Group), personalised podcast generation (Studio by Spotify Labs), audiobook creation (powered by ElevenLabs), and enhanced podcast Q&A features. Each product targets a different creator economy: musicians, podcasters, authors, and listeners. ...

Capframe vs Enforra — agent security tooling lands

The Agent Stack #035 — Wednesday Stack Two competing agent security frameworks dropped on HN this week. Both tackle the same critical problem: how to safely grant AI agents permission to actually do things. Neither is production-ready yet, but they’re worth testing now. The timing isn’t coincidental. Google’s I/O showcased agents everywhere — Gemini Spark handling your calendar, AI agents monitoring your inbox, even Volvo’s EX60 using Gemini to read parking signs through external cameras. More capability means more attack surface. ...