tacho - LLM Speed Test
Measure and compare AI model inference time in the CLI
$ tacho gpt-4.1 gemini/gemini-2.5-pro vertex_ai/claude-sonnet-4@20250514
✓ gpt-4.1
✓ vertex_ai/claude-sonnet-4@20250514
✓ gemini/gemini-2.5-pro
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━━┓
┃ Model ┃ Avg t/s ┃ Min t/s ┃ Max t/s ┃ Time ┃ Tokens ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━━┩
│ gemini/gemini-2.5-pro │ 80.0 │ 56.7 │ 128.4 │ 13.5s │ 998 │
│ vertex_ai/claude-sonnet-4@20250514 │ 48.9 │ 44.9 │ 51.6 │ 10.2s │ 500 │
│ gpt-4.1 │ 41.5 │ 35.1 │ 49.9 │ 12.3s │ 500 │
└────────────────────────────────────┴─────────┴─────────┴─────────┴───────┴────────┘
Quick Start
Run tacho with "uv" without installation:
uvx tacho gpt-4.1-nano gemini/gemini-2.0-flash
Or install globally:
uv tool install tacho
Features
⚡ Parallel Testing
Concurrent calls for faster results
💻 Open Source
Free and open source on GitHub
🔌 Multi-Provider
Works with all providers supported by LiteLLM
🎯 Zero Config
Just set your API keys and run
🔒 100% Private
No telemetry or data sent to our servers
🧠 Reasoning Support
Accurately takes into account thinking tokens
Usage
Set your API keys:
export OPENAI_API_KEY=<your-key>
export GEMINI_API_KEY=<your-key>
Run benchmarks with custom settings:
tacho gpt-4.1-nano claude-3.5-haiku --runs 3 --tokens 1000