When building a conversational voice AI application, developers need to understand the costs of the three core components: speech-to-text (STT), a large language model (LLM), and text-to-speech (TTS). Each provider charges differently—some by tokens, some by characters, others by minutes or subscription tiers—which makes direct comparison tricky.
For simplicity, we’ve converted all pricing in this post into a single metric: cost per minute. This isn’t an exact calculation, but a practical way to compare overall affordability across models for startups and developers. The use case focuses on conversational AI agents, such as those for therapy, voice assistant, education, language study, or consulting.
Table of Content
- STT Pricing Comparison (Per Minute)
- LLM Pricing Comparison (per minute)
- TTS Pricing Comparison (per minute)
- Outro and Download
1. STT Pricing Comparison (Per Minute)
STT model | Price per Minute (USD) | Notes |
Groq Whisper-Large-v3 | $0.00185 | Flat rate (no tiers): $0.111/hour ÷ 60. Starter/free tier via $10 credits (covers ~9,000 minutes). On-demand billing for business usage. |
AssemblyAI Universal-2 | $0.00380 | Weighted: 70% starter/pay-as-you-go pre-recorded ($0.0045/min, $0.27/hour), 30% business streaming ($0.0025/min, $0.15/hour). Free: $50 credits (~3,700 minutes). No enterprise included. |
Deepgram Nova-3 | $0.00420 | Weighted: 70% Pay As You Go/starter ($0.003/min monolingual), 30% Growth/business ($0.0036/min). Free: $200 credits (~46,500 minutes). 60% monolingual/40% multilingual split; no enterprise custom. |
ElevenLabs Scribe | $0.00580 | Weighted: 70% Pay As You Go/starter ($0.0067/min, $0.40/hour), 30% Business ($0.0037/min, $0.22/hour). Free: ~10-15 minutes. No enterprise custom. |
OpenAI Whisper | $0.00600 | Flat rate (no tiers): $0.006/min for large-v3 equivalent. Free: $5-18 credits (~13-50 minutes). Business usage same as starter. |
Notes:
This aggregation assumes typical mixed workloads (e.g., 70% starter for individuals/low-volume, 30% business for regular teams). Actual costs may vary by exact usage. For free tiers, all models offer trial credits (no ongoing free usage beyond limits).
2. LLM Pricing Comparison (per minute)
LLM Model | Price per Minute (USD) | Notes |
groq Llama 3.1 8B
Instant | $0.00105 | Input cost =
5k/1M×0.05=$0.00025 |
Google Gemini 2.5
Flash-Lite | $0.00450 | Input cost =
5k/1M×0.10=$0.00050 |
xAI grok-3-mini | $0.00650 | Input cost =
5k/1M×0.30=$0.00150 |
OpenAI GPT-4o-mini | $0.00675 | Input cost =
5k/1M×0.15=$0.00075 |
Notes:
The cost per minute is the sum of billed input and output tokens:
Price/min = (Effective Output Tokens × Output $/1M) + (Effective Input Tokens × Input $/1M),
where Effective Tokens = Tokens × (1 – Cache Hit Rate).
A 50% input cache, for example, halves input cost.
Adjust token counts (e.g., 10K/min full load or 5K/min light use) to estimate usage, noting that real costs can vary by ±10–20%.
3. TTS Pricing Comparison (per minute)
TTS Model | Price per Minute (USD) | Notes |
Google Chirp 3 HD TTS | $0.02700 | Based on $30 per 1M
characters; 900 characters/min. Free tier: 1M characters/month. |
Deepgram Aura-2 | $0.02700 | $0.030 per 1,000
characters ($0.00003/char); 900 characters/min. $200 free credits initially. |
Cartesia Sonic-2 | $0.03000 | Direct $0.03 per minute
of audio output; input negligible. Plan-based with included credits. |
OpenAI GPT-4o-mini TTS | $0.03000 | ~$0.015/min input text,
~$0.015/min output audio |
ElevenLabs Flash v2.5 | $0.06750 | $0.000075 per character
(0.5-1 credit discount for Flash); 900 characters/min. Plan-based; amortized
weighted ~$0.067 across tiers. |
Notes:
Assuming equal weighting across plans, using amortized rates for typical usage within limits.
Outro and Download
The comparison table provides a clear baseline, but we recommend testing each option to ensure it meets your needs for accuracy, latency, and quality. For more guidance on choosing STT and TTS, explore our detailed comparisons of speech-to-text and text-to-speech models—covering pros, cons, pricing, API access, and documentation—to help you select the best fit for your application.

Build Voice AI for free or low cost (Youtube)





