A Cheat Sheet of AI Agent API Pricing Comparison

When building a conversational voice AI application, developers need to understand the costs of the three core components: speech-to-text (STT), a large language model (LLM), and text-to-speech (TTS). Each provider charges differently—some by tokens, some by characters, others by minutes or subscription tiers—which makes direct comparison tricky.

For simplicity, we’ve converted all pricing in this post into a single metric: cost per minute. This isn’t an exact calculation, but a practical way to compare overall affordability across models for startups and developers. The use case focuses on conversational AI agents, such as those for therapy, voice assistant, education, language study, or consulting.

Table of Content

  1. STT Pricing Comparison (Per Minute)
  2. LLM Pricing Comparison (per minute)
  3. TTS Pricing Comparison (per minute)
  4. Outro and Download

1. STT Pricing Comparison (Per Minute)

STT model

Price per Minute (USD)

Notes

Groq Whisper-Large-v3

$0.00185

Flat rate (no tiers): $0.111/hour ÷ 60. Starter/free tier via $10 credits (covers ~9,000 minutes). On-demand billing for business usage.

AssemblyAI Universal-2

$0.00380

Weighted: 70% starter/pay-as-you-go pre-recorded ($0.0045/min, $0.27/hour), 30% business streaming ($0.0025/min, $0.15/hour). Free: $50 credits (~3,700 minutes). No enterprise included.

Deepgram Nova-3

$0.00420

Weighted: 70% Pay As You Go/starter ($0.003/min monolingual), 30% Growth/business ($0.0036/min). Free: $200 credits (~46,500 minutes). 60% monolingual/40% multilingual split; no enterprise custom.

ElevenLabs Scribe

$0.00580

Weighted: 70% Pay As You Go/starter ($0.0067/min, $0.40/hour), 30% Business ($0.0037/min, $0.22/hour). Free: ~10-15 minutes. No enterprise custom.

OpenAI Whisper

$0.00600

Flat rate (no tiers): $0.006/min for large-v3 equivalent. Free: $5-18 credits (~13-50 minutes). Business usage same as starter.

Notes:

This aggregation assumes typical mixed workloads (e.g., 70% starter for individuals/low-volume, 30% business for regular teams). Actual costs may vary by exact usage. For free tiers, all models offer trial credits (no ongoing free usage beyond limits).

2. LLM Pricing Comparison (per minute)

LLM Model

Price per Minute (USD)

Notes

 groq Llama 3.1 8B Instant

$0.00105

Input cost = 5k/1M×0.05=$0.00025
Output cost = 10K/1M×0.08=$0.00080
Total = $0.00105 / minute.

Google Gemini 2.5 Flash-Lite

$0.00450

Input cost = 5k/1M×0.10=$0.00050
Output cost = 10K/1M×0.4=$0.00400
Total = $0.00450 / minute.

xAI grok-3-mini

$0.00650

Input cost = 5k/1M×0.30=$0.00150
Output cost = 10K/1M×0.5=$0.00500
Total = $0.00650 / minute.

OpenAI GPT-4o-mini

$0.00675

Input cost = 5k/1M×0.15=$0.00075
Output cost = 10K/1M×0.6=$0.00600
Total = $0.00675 / minute.

Notes:

The cost per minute is the sum of billed input and output tokens:
Price/min = (Effective Output Tokens × Output $/1M) + (Effective Input Tokens × Input $/1M),
where Effective Tokens = Tokens × (1 – Cache Hit Rate).
A 50% input cache, for example, halves input cost.
Adjust token counts (e.g., 10K/min full load or 5K/min light use) to estimate usage, noting that real costs can vary by ±10–20%.

3. TTS Pricing Comparison (per minute)

TTS Model

Price per Minute (USD)

Notes

Google Chirp 3 HD TTS

$0.02700

Based on $30 per 1M characters; 900 characters/min. Free tier: 1M characters/month.

Deepgram Aura-2

$0.02700

$0.030 per 1,000 characters ($0.00003/char); 900 characters/min. $200 free credits initially.

Cartesia Sonic-2

$0.03000

Direct $0.03 per minute of audio output; input negligible. Plan-based with included credits.

OpenAI GPT-4o-mini TTS

$0.03000

~$0.015/min input text, ~$0.015/min output audio

ElevenLabs Flash v2.5

$0.06750

$0.000075 per character (0.5-1 credit discount for Flash); 900 characters/min. Plan-based; amortized weighted ~$0.067 across tiers.

Notes:

Assuming equal weighting across plans, using amortized rates for typical usage within limits.

Outro and Download

The comparison table provides a clear baseline, but we recommend testing each option to ensure it meets your needs for accuracy, latency, and quality. For more guidance on choosing STT and TTS, explore our detailed comparisons of speech-to-text and text-to-speech models—covering pros, cons, pricing, API access, and documentation—to help you select the best fit for your application.

cheatsheet API pricing thumbnail
Download Cheat Sheet of AI API Pricing

Build Voice AI for free or low cost (Youtube)



Comments are closed