OpenAI adds real-time voice to APIs

The Agent Stack #030 — Friday Signal

OpenAI just handed every agent builder a microphone. The new voice intelligence features in their API mean your agents can now hear, speak, and respond in real-time conversations.

This isn’t just text-to-speech with extra steps. The API now includes real-time voice streaming, conversation detection, and audio processing that works across different accents and languages. OpenAI’s specifically targeting customer service systems, education platforms, and creator tools.

The timing matters. Perplexity just launched their Personal Computer agent for Mac users this week—an AI that can control your desktop through natural conversation. Meanwhile, Mozilla found 271 vulnerabilities using Anthropic’s Mythos tool with “almost no false positives,” proving AI agents are getting genuinely useful for complex tasks.

Voice changes everything for agent adoption. Written chat feels like using a computer. Voice conversations feel like talking to a person. That psychological shift will make agents accessible to users who’d never touch a command line or form interface.

The infrastructure is finally catching up to the hype. Real-time voice processing was prohibitively expensive six months ago. Now it’s an API call. Your customer service agent can handle phone calls. Your educational agent can conduct verbal assessments. Your personal assistant can actually assist through speech.

But here’s what OpenAI didn’t announce: pricing. Real-time voice processing burns through tokens faster than text. Expect higher costs per interaction, especially for longer conversations. Budget accordingly.

Quick Hits:

• China’s Moonshot AI raised £1.5B at £15B valuation - Their open-source approach hit £160M ARR in April, showing demand for non-US AI alternatives is exploding

• SpaceX plans £40B chip plant in Texas - Musk’s building his own silicon foundry to avoid NVIDIA dependence, signalling serious infrastructure investment in AI hardware

• Snap ended £300M Perplexity deal “amicably” - The planned AI search integration was cancelled, suggesting even big tech partnerships aren’t guaranteed in this fast-moving market

One Thing to Try:

Test OpenAI’s voice API with a simple agent that can take restaurant reservations over the phone. Build it to handle interruptions, background noise, and unclear speech. This will teach you the edge cases that kill voice agents in production. Start with the basic streaming endpoint and add conversation detection once you understand the latency patterns.

Voice agents aren’t coming—they’re here. The question is whether you’ll build them before your competitors do.