Add voice to your agent
AI-Generated Summary: This is an automated summary created using AI. For the full details and context, please read the original post.
Adding Voice to Cloudflare Agents
Cloudflare has released an experimental voice pipeline for the Agents SDK, allowing developers to add real-time voice to their existing agent architecture. The @cloudflare/voice package provides a set of tools and APIs for building voice-enabled agents, including speech-to-text (STT) and text-to-speech (TTS) capabilities.
Key Features
withVoice(Agent)andwithVoiceInput(Agent)functions for building full conversation voice agents and speech-to-text-only use cases, respectivelyuseVoiceAgentanduseVoiceInputhooks for React appsVoiceClientfor framework-agnostic clients- Built-in Workers AI providers for STT and TTS, including Deepgram Flux, Deepgram Nova 3, and Deepgram Aura
How it Works
The voice pipeline extends the existing Agents SDK model, using the same Durable Object, WebSocket connection, and application logic. The flow involves:
- Audio transport: The browser captures microphone audio and streams it over the same WebSocket connection.
- STT session setup: The agent creates a continuous transcriber session when the call starts.
- STT input: Audio streams are processed by the STT provider.
- TTS output: The agent sends TTS output to the client.
Getting Started
To build a voice-enabled agent, developers can use the minimal server-side pattern provided by Cloudflare, which includes importing the @cloudflare/voice package and extending the Agent class with the withVoice function. On the client side, developers can use the useVoiceAgent hook to connect to the agent and start a voice conversation.
Want to read the full article?
Read Full Post on Cloudflare Blog