tacho - LLM Speed Test

Measure and compare AI model inference time in the CLI

$ tacho gpt-4.1 gemini/gemini-2.5-pro vertex_ai/claude-sonnet-4@20250514 ✓ gpt-4.1 ✓ vertex_ai/claude-sonnet-4@20250514 ✓ gemini/gemini-2.5-pro ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━━┓ ┃ Model ┃ Avg t/s ┃ Min t/s ┃ Max t/s ┃ Time ┃ Tokens ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━━┩ │ gemini/gemini-2.5-pro │ 80.0 │ 56.7 │ 128.4 │ 13.5s │ 998 │ │ vertex_ai/claude-sonnet-4@20250514 │ 48.9 │ 44.9 │ 51.6 │ 10.2s │ 500 │ │ gpt-4.1 │ 41.5 │ 35.1 │ 49.9 │ 12.3s │ 500 │ └────────────────────────────────────┴─────────┴─────────┴─────────┴───────┴────────┘

Quick Start

Run tacho with "uv" without installation:

uvx tacho gpt-4.1-nano gemini/gemini-2.0-flash

Or install globally:

uv tool install tacho

Features

⚡ Parallel Testing

Concurrent calls for faster results

💻 Open Source

Free and open source on GitHub

🔌 Multi-Provider

Works with all providers supported by LiteLLM

🎯 Zero Config

Just set your API keys and run

🔒 100% Private

No telemetry or data sent to our servers

🧠 Reasoning Support

Accurately takes into account thinking tokens

Usage

Set your API keys:

                    export OPENAI_API_KEY=<your-key>
export GEMINI_API_KEY=<your-key>
                

Run benchmarks with custom settings:

tacho gpt-4.1-nano claude-3.5-haiku --runs 3 --tokens 1000