Before building your agent, it’s important to understand how fast your models run. Benchmarking helps you choose the right model for your hardware — and for the tasks your agent will perform.
Every model behaves differently on your hardware. Some are lightning‑fast but lightweight, while others deliver deeper reasoning at the cost of memory and speed. Benchmarking gives you a clear, data‑driven understanding of how each model performs on your system — a crucial step before building agents that need to think, plan, and act efficiently.
Ollama includes a built‑in benchmarking tool that measures real‑world performance across several key metrics:
These numbers help you choose the right model for your agent — whether you’re optimizing for speed, reasoning, or resource efficiency.
Benchmark any installed model with a single command. For example, to test Llama 3:
Try benchmarking a few different models to compare performance:
Each benchmark runs a standardized prompt and reports performance metrics tailored to your hardware.
Your results will vary depending on your CPU, GPU, RAM, and background processes. If you want to compare your numbers with the broader community, the Open LLM Leaderboard provides helpful context for model capabilities.
Keeping a simple benchmark table helps you understand which models feel best for your workflow. Fill in your results below:
| Model | Tokens/sec | Latency | Memory |
|---|---|---|---|
| llama3 | — | — | — |
| qwen | — | — | — |
| mistral | — | — | — |
| phi | — | — | — |
As a rule of thumb: faster models are better for interactive tasks, while larger models excel at reasoning, planning, and multi‑step problem solving.
Your benchmark results will guide you, but here’s a quick reference based on common goals:
If you want speed:
If you want balanced performance:
If you want maximum reasoning:
There’s no single “best” model — only the best model for your hardware and your use case.
Benchmark feels slow
High memory usage
GPU not used
Next Step
Install Node.js →