Run OpenClaw on Ollama — free and local.
Local models are good enough for 60–80% of OpenClaw's daily traffic — heartbeats, simple lookups, summarization. Ollama is the easiest way to run them. The trick is knowing which model fits your hardware and how to keep cloud models in the loop only for the work that needs them.
Quick answers
Can I run OpenClaw without an API key?
Yes — use Ollama for local inference. Install Ollama, pull a model (ollama pull qwen2.5:7b), then point OpenClaw athttp://localhost:11434/v1. Zero API cost, fully offline.Can OpenClaw run fully offline?
Yes with Ollama. The agent loop and skills run locally; LLM inference is local too. The only network needs come from skills that visit URLs (browser, web search). For a fully air-gapped setup, restrict the tool allowlist accordingly.Which Ollama model is best for OpenClaw?
Match RAM first. 4 GB → Llama 3.2 3B (chat only). 8 GB → Qwen 2.5 7B (best 7B for tool use). 16 GB → Qwen 2.5 14B (browser-capable). 48 GB+ → Llama 3.3 70B (cloud-tier quality).How much money does Ollama save vs Anthropic?
Hybrid setups (local for 60–80% of traffic, cloud for browser + complex reasoning) drop monthly bills 60–90% versus all-cloud. Real audits: $32/mo personal → $8/mo. $580/mo multi-agent → $95/mo.Is local Ollama as good as Claude Sonnet?
On chat and summarization, 80–90% as good. On multi-step tool use and complex reasoning, 60–70% — the gap is real. Hybrid routing solves it: route easy work to Ollama, complex work to Sonnet.
Strategy
Why local + cloud
The trick to running OpenClaw cheaply is recognizing that 60–80% of its requests don't need a flagship model. Heartbeats, status checks, simple lookups, summarization — all of these run great on a local 7B-class model. Reserve cloud calls for the work that actually needs cloud quality.
The pattern is called model routing: Ollama handles the high-volume, low-stakes traffic; Claude Sonnet or GPT-4o handles complex reasoning and the tasks where you'd actually notice quality. We'll wire that up below.
Step 1
Install Ollama
Ollama is a one-line install. It runs as a local server on port 11434 and exposes an OpenAI-compatible API.
# macOS
brew install ollama
# Linux
curl -fsSL https://ollama.com/install.sh | sh
# Verify
ollama --version
curl http://localhost:11434/api/tagsRun as a service
On Linux, the installer registers a systemd unit so Ollama starts at boot. On macOS, the menu-bar app does the same.Step 2
Pick a model
Use the picker — it'll narrow to the models that actually fit your RAM and use case. Don't just download the biggest one that fits; bigger isn't strictly better, and a model that doesn't tool-call reliably is useless to OpenClaw.
Ollama model picker
Available RAM
Primary use
Qwen 2.5 14B Recommended
Noticeable quality jump. Browser tasks become viable.
ollama pull qwen2.5:14bQwen 2.5 7B
Strong all-rounder. Reliable tool calling for an open model.
ollama pull qwen2.5:7bQwen 2.5 Coder 7B
Code-tuned. Pairs well with the exec/edit tools.
ollama pull qwen2.5-coder:7bLlama 3.1 8B
Solid general model. Slightly weaker tool-calling than Qwen.
ollama pull llama3.1:8b
| Model | RAM needed | Tool use | Notes |
|---|---|---|---|
| Llama 3.2 3B | 4 GB | Shaky | Chat only. Good baseline. |
| Qwen 2.5 7B | 8 GB | Reliable | Best 7B for OpenClaw. |
| Qwen 2.5 Coder 7B | 8 GB | Reliable | Code-tuned, pairs with exec/edit. |
| Qwen 2.5 14B | 16 GB | Strong | Browser-capable. |
| Llama 3.3 70B | 48 GB | Cloud-tier | Approaches Sonnet quality. |
Step 3
Wire it into OpenClaw
OpenClaw treats Ollama as just another provider — same config shape, different base URL.
{
"providers": {
"ollama": {
"baseUrl": "http://localhost:11434/v1",
"apiKey": "ollama"
}
},
"models": {
"default": {
"provider": "ollama",
"model": "qwen2.5:7b"
}
}
}Restart the gateway and test:
openclaw gateway restart
openclaw chat
> helloYou should see a response. If not, check openclaw logs --follow while you send a message — the most common issue is Ollama not yet having the model loaded into memory (warm-up takes 5–10s the first time).
Step 4
The routing pattern
The biggest cost win comes from routing traffic by complexity. OpenClaw lets you assign different models to different tasks.
{
"providers": {
"ollama": { "baseUrl": "http://localhost:11434/v1", "apiKey": "ollama" },
"anthropic": { "apiKey": "sk-ant-..." }
},
"models": {
"default": { "provider": "ollama", "model": "qwen2.5:7b" },
"complex": { "provider": "anthropic", "model": "claude-sonnet-4-6" },
"heartbeat": { "provider": "ollama", "model": "llama3.2:3b" }
},
"routing": {
"rules": [
{ "match": { "tool": "browser" }, "model": "complex" },
{ "match": { "intent": "long-form" }, "model": "complex" },
{ "match": { "type": "heartbeat" }, "model": "heartbeat" }
]
}
}- Heartbeats → smallest local model. They run every few minutes and are 80% of token volume. Make them free.
- Browser tasks → cloud. Page snapshots are long and the planning is non-trivial. Worth the cost.
- Long-form drafting → cloud. The quality gap shows up most in extended writing.
- Everything else → local. Most chat, simple tool use, lookups.
Honest take
Where local falls short
We've benchmarked Qwen 2.5 7B against Claude Sonnet on real OpenClaw workloads. The gap is real and predictable.
| Task | Qwen 2.5 7B | Claude Sonnet 4.6 | Verdict |
|---|---|---|---|
| Plain chat | 9/10 | 10/10 | Local wins on cost |
| Single tool call | 8/10 | 10/10 | Local fine |
| 3+ tool chain | 5/10 | 9/10 | Cloud |
| Browser navigation | 4/10 | 9/10 | Cloud |
| Long-form writing | 6/10 | 9/10 | Cloud |
| Code generation | 7/10 (Coder variant) | 9/10 | Either |
Don't fight the gap
If a workflow needs reliable multi-step tool use, route it to cloud and stop trying to make local work. The cost savings are still huge because that workflow is a small slice of total traffic.Numbers
Real savings, real numbers
From actual production setups we've audited:
| Setup | All cloud (Sonnet) | Hybrid (Ollama + Sonnet) | Drop |
|---|---|---|---|
| Personal assistant, 50 msgs/day | $32/mo | $8/mo | 75% |
| Team agent, 200 msgs/day | $95/mo | $24/mo | 75% |
| Browser-heavy research, 500 msgs/day | $340/mo | $110/mo | 68% |
| Multi-agent (5), heartbeats every 3min | $580/mo | $95/mo | 84% |
The bigger your heartbeat overhead, the bigger the savings. Multi-agent setups with frequent check-ins benefit most because that's where the redundant token volume sits.
For the full optimization playbook, read the cost guide.
FAQ
Want OpenClaw without the ops?
Provision is the managed OpenClaw cloud — agents, channels, browser, and skills, all running. $99/mo. 48-hour free trial.