Voice Channel

Voice Calls

AI voice agents powered by Pipecat that make phone calls indistinguishable from real humans. Uses Deepgram for speech-to-text, GPT-4o for conversation, and ElevenLabs for text-to-speech.

Voice Pipeline

The voice agent runs a real-time audio pipeline using the Pipecat framework. Audio streams through Twilio WebSocket and is processed in the following stages:

Deepgram STT

Nova-3 speech-to-text

GPT-4o

Via OpenRouter

ElevenLabs TTS

Natural voice synthesis

Twilio

Call transport

Pipeline Architecture

Audio In (Twilio WebSocket)
    |
    v
Silero VAD (Voice Activity Detection)
    |
    v
Deepgram Nova-3 STT (Speech-to-Text)
    |
    v
LLM Context Aggregator
    |
    v
GPT-4o via OpenRouter (Conversation AI)
    |
    v
TranscriptCollector (Logs full conversation)
    |
    v
ElevenLabs TTS (Text-to-Speech)
    |
    v
Audio Out (Twilio WebSocket)

Project Structure

voice-agent/

voice-agent/
├── bot.py               # Main Pipecat pipeline (STT -> LLM -> TTS)
├── server.py            # Custom server wrapper adding API routes
├── api.py               # FastAPI endpoints (/api/call, /api/config, /api/health)
├── config.py            # Agent identity, model settings, greeting templates
├── prompts.py           # System prompt generation with personality presets
├── live_config.py       # Thread-safe live config store (dashboard sync)
├── requirements.txt
└── Dockerfile           # Railway deployment (port 7860)

Voice Agent API

POST/api/callInitiate an outbound call to a phone number

POST/api/configSync voice configuration from dashboard

GET/api/configGet current voice configuration

GET/api/healthHealth check endpoint

Configuration

Voice settings are managed from the dashboard and synced to the Railway service via the /api/config endpoint.

Setting	Description
Voice ID	ElevenLabs voice identifier for TTS
Selected Voices	Array of available voice options
Language	uk (Ukrainian), en (English), or multi (multilingual)
Personality	Preset: professional, friendly, persuasive, etc.
Speed	TTS speaking speed multiplier (default: 1.0)

Call Logging

When a call disconnects, the voice agent automatically saves a full call log to the call_logs table including:

Full transcript as JSONB array of {role, content} pairs
Flat transcription text for search
Call duration in seconds
Sentiment analysis (positive, neutral, negative)
Call score (0-100)
Recording URL (if enabled)

Call Statuses

completed

failed

no_answer

voicemail

busy

in_progress