The Agent Stack #014 — Wednesday Stack


The AI gateway everyone trusts just got compromised. LiteLLM, used by thousands of developers to manage model API calls, fell victim to credential-stealing malware via their security compliance partner Delve. This isn’t just another breach story—it’s a wake-up call about the fragile infrastructure we’re building AI agents on.

What Actually Happened

LiteLLM serves as a proxy layer between your applications and model providers like OpenAI, Anthropic, and Cohere. Think of it as the plumbing that routes your API calls, handles rate limiting, and logs usage. The ishaan-jaff/litellm repo has 13.7k stars and gets downloaded millions of times monthly.

The attack vector was brilliantly simple. LiteLLM had outsourced SOC 2 and ISO 27001 compliance to a startup called Delve. Delve’s systems got compromised, giving attackers access to LiteLLM’s infrastructure. From there, they could potentially intercept API keys, model requests, and response data flowing through the gateway.

The timing couldn’t be worse. AI recruiting startup Mercor confirmed they were breached through this same attack vector, with hackers claiming to have stolen sensitive data from their systems.

The Real Problem

This incident highlights a fundamental weakness in the AI stack. We’ve built a tower of dependencies that most teams don’t properly audit. LiteLLM depends on Delve. Your agent depends on LiteLLM. Your customer data flows through all of it.

I tested LiteLLM extensively last month for routing between Claude and GPT-4o. The convenience is undeniable—one API interface, automatic fallbacks, built-in caching. But the security model assumes you trust every component in the chain.

The alternatives aren’t much better. OpenAI’s API is more secure but locks you into their ecosystem. Self-hosting something like BerriAI/litellm removes third-party risk but adds operational complexity. Most teams choose convenience over security.

What You Should Do Now

Check your infrastructure dependencies immediately. If you’re using LiteLLM, audit what data flows through it and whether you’ve rotated API keys since the breach. The project maintainers have been responsive—they’ve dropped Delve and are rebuilding their compliance processes—but trust takes time to rebuild.

Consider implementing API key rotation as standard practice. Most model providers support multiple keys per account. Rotate them weekly, not when something goes wrong.

For new projects, evaluate whether you actually need an AI gateway. Direct API calls to model providers are more secure and often faster. The convenience of unified interfaces comes with real costs.

Quick Hits

Anthropic’s code leak revealed they’re building a Tamagotchi-style AI pet and always-on agent features. The leak came from a source map accidentally included in Claude Code’s TypeScript build—basic operational security failure.

IBM’s Granite 4.0 3B Vision is actually usable for document analysis on modest hardware. I tested it on financial PDFs and it correctly extracted structured data 87% of the time, comparable to GPT-4V but running locally.

TRL v1.0 from HuggingFace finally makes post-training approachable. The new API handles RLHF, DPO, and KTO with sensible defaults. No more wrestling with distributed training configs for fine-tuning.

One Thing to Try

Audit your AI stack dependencies this week. Map every service, library, and third-party integration your agents use. Document who has access to your API keys and model data. Most teams discover they’ve accidentally created massive attack surfaces they never intended.

Security isn’t just about the models—it’s about everything that touches them.