Voice Channel
Voice Calls
AI voice agents powered by Pipecat that make phone calls indistinguishable from real humans. Uses Deepgram for speech-to-text, GPT-4o for conversation, and ElevenLabs for text-to-speech.
Voice Pipeline
The voice agent runs a real-time audio pipeline using the Pipecat framework. Audio streams through Twilio WebSocket and is processed in the following stages:
Deepgram STT
Nova-3 speech-to-text
GPT-4o
Via OpenRouter
ElevenLabs TTS
Natural voice synthesis
Twilio
Call transport
Pipeline Architecture
Audio In (Twilio WebSocket)
|
v
Silero VAD (Voice Activity Detection)
|
v
Deepgram Nova-3 STT (Speech-to-Text)
|
v
LLM Context Aggregator
|
v
GPT-4o via OpenRouter (Conversation AI)
|
v
TranscriptCollector (Logs full conversation)
|
v
ElevenLabs TTS (Text-to-Speech)
|
v
Audio Out (Twilio WebSocket)Project Structure
voice-agent/
voice-agent/ ├── bot.py # Main Pipecat pipeline (STT -> LLM -> TTS) ├── server.py # Custom server wrapper adding API routes ├── api.py # FastAPI endpoints (/api/call, /api/config, /api/health) ├── config.py # Agent identity, model settings, greeting templates ├── prompts.py # System prompt generation with personality presets ├── live_config.py # Thread-safe live config store (dashboard sync) ├── requirements.txt └── Dockerfile # Railway deployment (port 7860)
Voice Agent API
POST
/api/callInitiate an outbound call to a phone numberPOST
/api/configSync voice configuration from dashboardGET
/api/configGet current voice configurationGET
/api/healthHealth check endpointConfiguration
Voice settings are managed from the dashboard and synced to the Railway service via the /api/config endpoint.
| Setting | Description |
|---|---|
| Voice ID | ElevenLabs voice identifier for TTS |
| Selected Voices | Array of available voice options |
| Language | uk (Ukrainian), en (English), or multi (multilingual) |
| Personality | Preset: professional, friendly, persuasive, etc. |
| Speed | TTS speaking speed multiplier (default: 1.0) |
Call Logging
When a call disconnects, the voice agent automatically saves a full call log to the call_logs table including:
- Full transcript as JSONB array of
{role, content}pairs - Flat transcription text for search
- Call duration in seconds
- Sentiment analysis (positive, neutral, negative)
- Call score (0-100)
- Recording URL (if enabled)
Call Statuses
completed
failed
no_answer
voicemail
busy
in_progress