Voice Channel

Voice Calls

AI voice agents powered by Pipecat that make phone calls indistinguishable from real humans. Uses Deepgram for speech-to-text, GPT-4o for conversation, and ElevenLabs for text-to-speech.

Voice Pipeline

The voice agent runs a real-time audio pipeline using the Pipecat framework. Audio streams through Twilio WebSocket and is processed in the following stages:

Deepgram STT

Nova-3 speech-to-text

GPT-4o

Via OpenRouter

ElevenLabs TTS

Natural voice synthesis

Twilio

Call transport

Pipeline Architecture
Audio In (Twilio WebSocket)
    |
    v
Silero VAD (Voice Activity Detection)
    |
    v
Deepgram Nova-3 STT (Speech-to-Text)
    |
    v
LLM Context Aggregator
    |
    v
GPT-4o via OpenRouter (Conversation AI)
    |
    v
TranscriptCollector (Logs full conversation)
    |
    v
ElevenLabs TTS (Text-to-Speech)
    |
    v
Audio Out (Twilio WebSocket)

Project Structure

voice-agent/
voice-agent/
├── bot.py               # Main Pipecat pipeline (STT -> LLM -> TTS)
├── server.py            # Custom server wrapper adding API routes
├── api.py               # FastAPI endpoints (/api/call, /api/config, /api/health)
├── config.py            # Agent identity, model settings, greeting templates
├── prompts.py           # System prompt generation with personality presets
├── live_config.py       # Thread-safe live config store (dashboard sync)
├── requirements.txt
└── Dockerfile           # Railway deployment (port 7860)

Voice Agent API

POST/api/callInitiate an outbound call to a phone number
POST/api/configSync voice configuration from dashboard
GET/api/configGet current voice configuration
GET/api/healthHealth check endpoint

Configuration

Voice settings are managed from the dashboard and synced to the Railway service via the /api/config endpoint.

SettingDescription
Voice IDElevenLabs voice identifier for TTS
Selected VoicesArray of available voice options
Languageuk (Ukrainian), en (English), or multi (multilingual)
PersonalityPreset: professional, friendly, persuasive, etc.
SpeedTTS speaking speed multiplier (default: 1.0)

Call Logging

When a call disconnects, the voice agent automatically saves a full call log to the call_logs table including:

  • Full transcript as JSONB array of {role, content} pairs
  • Flat transcription text for search
  • Call duration in seconds
  • Sentiment analysis (positive, neutral, negative)
  • Call score (0-100)
  • Recording URL (if enabled)

Call Statuses

completed
failed
no_answer
voicemail
busy
in_progress