Can I run OpenClaw without an API key?

Yes — use Ollama for local inference. Install Ollama, pull a model with 'ollama pull qwen2.5:7b', then point OpenClaw at http://localhost:11434/v1. Zero API cost, fully offline.

Can OpenClaw run fully offline?

Yes with Ollama. The agent loop and skills run locally and LLM inference happens locally. Only skills that visit URLs need network. For air-gapped, restrict the tool allowlist.

Which Ollama model is best for OpenClaw?

Match RAM first. 4 GB → Llama 3.2 3B (chat only). 8 GB → Qwen 2.5 7B. 16 GB → Qwen 2.5 14B. 48 GB+ → Llama 3.3 70B for cloud-tier quality.

How much money does Ollama save vs Anthropic?

Hybrid setups drop monthly bills 60–90% versus all-cloud. Real audits show $32/mo personal dropping to $8/mo, and $580/mo multi-agent dropping to $95/mo.

Is local Ollama as good as Claude Sonnet?

On chat and summarization, 80–90% as good. On multi-step tool use and complex reasoning, 60–70% — gap is real. Hybrid routing solves it: easy work to Ollama, complex work to Sonnet.

Optimize guide11 minUpdated 2026-05-06

Run OpenClaw on Ollama — free and local.

Local models are good enough for 60–80% of OpenClaw's daily traffic — heartbeats, simple lookups, summarization. Ollama is the easiest way to run them. The trick is knowing which model fits your hardware and how to keep cloud models in the loop only for the work that needs them.

Quick answers

Can I run OpenClaw without an API key?
Yes — use Ollama for local inference. Install Ollama, pull a model (ollama pull qwen2.5:7b), then point OpenClaw at http://localhost:11434/v1. Zero API cost, fully offline.
Can OpenClaw run fully offline?
Yes with Ollama. The agent loop and skills run locally; LLM inference is local too. The only network needs come from skills that visit URLs (browser, web search). For a fully air-gapped setup, restrict the tool allowlist accordingly.
Which Ollama model is best for OpenClaw?
Match RAM first. 4 GB → Llama 3.2 3B (chat only). 8 GB → Qwen 2.5 7B (best 7B for tool use). 16 GB → Qwen 2.5 14B (browser-capable). 48 GB+ → Llama 3.3 70B (cloud-tier quality).
How much money does Ollama save vs Anthropic?
Hybrid setups (local for 60–80% of traffic, cloud for browser + complex reasoning) drop monthly bills 60–90% versus all-cloud. Real audits: $32/mo personal → $8/mo. $580/mo multi-agent → $95/mo.
Is local Ollama as good as Claude Sonnet?
On chat and summarization, 80–90% as good. On multi-step tool use and complex reasoning, 60–70% — the gap is real. Hybrid routing solves it: route easy work to Ollama, complex work to Sonnet.

Strategy

Why local + cloud

The trick to running OpenClaw cheaply is recognizing that 60–80% of its requests don't need a flagship model. Heartbeats, status checks, simple lookups, summarization — all of these run great on a local 7B-class model. Reserve cloud calls for the work that actually needs cloud quality.

Per-token cost (local)

60–80%

Of traffic that fits

8 GB

Min RAM for tool use

$45→$11

Typical monthly drop

The pattern is called model routing: Ollama handles the high-volume, low-stakes traffic; Claude Sonnet or GPT-4o handles complex reasoning and the tasks where you'd actually notice quality. We'll wire that up below.

Step 1

Install Ollama

Ollama is a one-line install. It runs as a local server on port 11434 and exposes an OpenAI-compatible API.

# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# Verify
ollama --version
curl http://localhost:11434/api/tags

Run as a service

On Linux, the installer registers a systemd unit so Ollama starts at boot. On macOS, the menu-bar app does the same.

Step 2

Pick a model

Use the picker — it'll narrow to the models that actually fit your RAM and use case. Don't just download the biggest one that fits; bigger isn't strictly better, and a model that doesn't tool-call reliably is useless to OpenClaw.

Ollama model picker

Available RAM

16 GB

Primary use

Qwen 2.5 14B Recommended
Noticeable quality jump. Browser tasks become viable.
ollama pull qwen2.5:14b
Qwen 2.5 7B
Strong all-rounder. Reliable tool calling for an open model.
ollama pull qwen2.5:7b
Qwen 2.5 Coder 7B
Code-tuned. Pairs well with the exec/edit tools.
ollama pull qwen2.5-coder:7b
Llama 3.1 8B
Solid general model. Slightly weaker tool-calling than Qwen.
ollama pull llama3.1:8b

Model	RAM needed	Tool use	Notes
Llama 3.2 3B	4 GB	Shaky	Chat only. Good baseline.
Qwen 2.5 7B	8 GB	Reliable	Best 7B for OpenClaw.
Qwen 2.5 Coder 7B	8 GB	Reliable	Code-tuned, pairs with exec/edit.
Qwen 2.5 14B	16 GB	Strong	Browser-capable.
Llama 3.3 70B	48 GB	Cloud-tier	Approaches Sonnet quality.

Step 3

Wire it into OpenClaw

OpenClaw treats Ollama as just another provider — same config shape, different base URL.

~/.openclaw/agents/main/agent/models.jsonjson

{
  "providers": {
    "ollama": {
      "baseUrl": "http://localhost:11434/v1",
      "apiKey": "ollama"
    }
  },
  "models": {
    "default": {
      "provider": "ollama",
      "model": "qwen2.5:7b"
    }
  }
}

Restart the gateway and test:

openclaw gateway restart
openclaw chat
> hello

You should see a response. If not, check openclaw logs --follow while you send a message — the most common issue is Ollama not yet having the model loaded into memory (warm-up takes 5–10s the first time).

Step 4

The routing pattern

The biggest cost win comes from routing traffic by complexity. OpenClaw lets you assign different models to different tasks.

~/.openclaw/agents/main/agent/models.jsonjson

{
  "providers": {
    "ollama": { "baseUrl": "http://localhost:11434/v1", "apiKey": "ollama" },
    "anthropic": { "apiKey": "sk-ant-..." }
  },
  "models": {
    "default": { "provider": "ollama", "model": "qwen2.5:7b" },
    "complex": { "provider": "anthropic", "model": "claude-sonnet-4-6" },
    "heartbeat": { "provider": "ollama", "model": "llama3.2:3b" }
  },
  "routing": {
    "rules": [
      { "match": { "tool": "browser" }, "model": "complex" },
      { "match": { "intent": "long-form" }, "model": "complex" },
      { "match": { "type": "heartbeat" }, "model": "heartbeat" }
    ]
  }
}

Heartbeats → smallest local model. They run every few minutes and are 80% of token volume. Make them free.
Browser tasks → cloud. Page snapshots are long and the planning is non-trivial. Worth the cost.
Long-form drafting → cloud. The quality gap shows up most in extended writing.
Everything else → local. Most chat, simple tool use, lookups.

Honest take

Where local falls short

We've benchmarked Qwen 2.5 7B against Claude Sonnet on real OpenClaw workloads. The gap is real and predictable.

Task	Qwen 2.5 7B	Claude Sonnet 4.6	Verdict
Plain chat	9/10	10/10	Local wins on cost
Single tool call	8/10	10/10	Local fine
3+ tool chain	5/10	9/10	Cloud
Browser navigation	4/10	9/10	Cloud
Long-form writing	6/10	9/10	Cloud
Code generation	7/10 (Coder variant)	9/10	Either

Don't fight the gap

If a workflow needs reliable multi-step tool use, route it to cloud and stop trying to make local work. The cost savings are still huge because that workflow is a small slice of total traffic.

Numbers

Real savings, real numbers

From actual production setups we've audited:

Setup	All cloud (Sonnet)	Hybrid (Ollama + Sonnet)	Drop
Personal assistant, 50 msgs/day	$32/mo	$8/mo	75%
Team agent, 200 msgs/day	$95/mo	$24/mo	75%
Browser-heavy research, 500 msgs/day	$340/mo	$110/mo	68%
Multi-agent (5), heartbeats every 3min	$580/mo	$95/mo	84%

The bigger your heartbeat overhead, the bigger the savings. Multi-agent setups with frequent check-ins benefit most because that's where the redundant token volume sits.

For the full optimization playbook, read the cost guide.

FAQ

Can I run OpenClaw fully offline with Ollama?

Yes. Set Ollama as the only provider, disable cloud fallbacks, and OpenClaw runs entirely on your machine. The browser still needs network access for the URLs it visits, but no data leaves to LLM providers.

Which Ollama model should I pick?

Match RAM first: 4 GB → Llama 3.2 3B (chat only), 8 GB → Qwen 2.5 7B (good tool use), 16 GB → Qwen 2.5 14B or Llama 3.1 8B (browser-capable), 48 GB+ → Llama 3.3 70B (cloud-quality). Use the picker above for a precise recommendation.

How much quality do I lose vs Claude?

On chat and summarization, very little — small/mid local models are 80–90% of Sonnet on those tasks. On multi-step tool use and complex reasoning, the gap is real: expect 60–70% of cloud quality from a 7B model. Hybrid routing solves this.

Does Ollama work for browser automation?

It works, but the model needs to be capable enough to plan browser actions. 7B models often hallucinate selectors. 14B+ is where browser automation becomes reliably useful with local models. For browser-heavy work, route those calls to a cloud model.

What about token speed?

On an M-series Mac or a recent Nvidia GPU, expect 30–80 tokens/sec on a 7B model — comparable to cloud streaming. CPU-only is 5–15 tokens/sec, which is workable for chat but painful for long generations.

Can I quantize models further to fit smaller hardware?

Yes — Ollama defaults to Q4_K_M which is already a good quality/size tradeoff. You can pull Q3 or Q2 variants for tighter fits, but quality drops noticeably below Q4 for tool use. Don't go below Q4 if the agent needs to call skills reliably.

Keep going

Cost optimization Hardware sizing What is OpenClaw?Self-host an AI agent

Want OpenClaw without the ops?

Provision is the managed OpenClaw cloud — agents, channels, browser, and skills, all running. $99/mo. 48-hour free trial.

Start 48h free trial All guides