Running lac-cli Fully Offline with Ollama

I got tired of burning API credits on routine stuff

Most of what I use lac agent for during a normal workday isn't complex reasoning — it's tedious stuff. Renaming a batch of files to match a convention. Wiring up a new route to an existing controller. Writing a test for a function that already has a clear signature. None of that needs a frontier model. It just needs something that can read code and follow instructions.

So I started routing that work through Ollama. I've been running lac-cli against local models for a few months now, and the setup is genuinely painless. Here's exactly how I have it configured.

What Ollama gives you

Ollama runs open-source models locally — things like Llama 3, Mistral, Qwen, Phi-3, Gemma, DeepSeek Coder — as a local HTTP server. Once it's running, it exposes an OpenAI-compatible API on http://localhost:11434. That's the important part. lac-cli has an ollama provider that points there by default.

No API key. No data leaving your machine. Works on a plane, in a hotel, in a basement with no Wi-Fi. If that matters for your work — and for a lot of us it does — this is worth setting up once.

Getting Ollama running

Download it from ollama.com and install it for your OS. Then pull a model:

ollama pull llama3

For coding tasks specifically, I've had good results with deepseek-coder and qwen2.5-coder. Pull whichever you want:

ollama pull deepseek-coder
ollama pull qwen2.5-coder:7b

Start the server (it auto-starts on most installs, but you can also run it manually):

ollama serve

That's it for Ollama. It's sitting on localhost:11434 ready to go.

Pointing lac-cli at it

lac-cli stores its config at ~/.lac/config.json. Open it and set your provider and model:

{
  "provider": "ollama",
  "model": "deepseek-coder",
  "ollama_base_url": "http://localhost:11434"
}

If you installed lac-cli via pip (pip install lac-cli) or the install script, the config file will already exist after your first run. Just update those three fields.

Now everything — lac shell, lac agent, lac mind, lac gendoc — routes through your local model. No cloud, no key, no cost per token.

What actually works well offline

I want to be straight with you: not everything is equal across model sizes. Here's what I've found in practice.

lac shell — great

lac shell is the AI-powered interactive shell where you type plain English and get a real command back before running it. This is where local models shine. "Find all Python files modified in the last 3 days" or "compress this directory and name it with today's date" — a 7B coder model handles these without breaking a sweat. Ghost text autocomplete stays snappy because the requests are small and latency is just a disk read away.

lac agent — good, with caveats

lac agent reads and writes your actual project files. It loads .lac-memory.json to remember your project context across sessions, which helps a lot because you're not re-explaining the same codebase every time.

For local models, I'd recommend keeping tasks scoped. "Add input validation to this controller" works great. "Refactor my entire auth system" — that's where a 7B model starts to drift and you'll need to course-correct more often. Use PlanMode (/plan inside the agent) before any bigger task; it forces the model to lay out steps first, which dramatically improves quality on smaller models.

Undo is still there if it goes sideways — /undo reverts file changes with a diff preview before anything is committed. That's not model-specific, it's just how the agent works, and it matters more when you're running a smaller local model.

lac mind — interesting, not always faster

lac mind runs a multi-model debate where models challenge each other's answers across rounds. Running this entirely locally means you're running multiple inference passes on your GPU. On a MacBook Pro with an M-series chip it's usable. On a machine without a decent GPU, it'll be slow. I mostly use lac mind with mixed providers — local Ollama for one debater and a cloud model for the other — which you can configure per-model in the config.

lac gendoc — works fine

lac gendoc scans your codebase, detects the framework (Laravel, Django, FastAPI, Flask, Express, Rails), and generates interactive HTML API docs. This is mostly parsing work with some summarization on top. Local models handle it well. I've generated full docs for a mid-size FastAPI project on qwen2.5-coder:7b and the output was clean enough to ship.

Switching between providers on the fly

You don't have to commit to one provider forever. If you want to use Ollama for quick tasks but drop back to Claude for something complex, you can pass the provider inline:

lac agent --provider claude --model claude-3-5-sonnet-20241022

Or flip the config, do the heavy task, and flip it back. It takes five seconds. The project memory in .lac-memory.json carries over regardless of which provider you use — that file is provider-agnostic.

Model recommendations by task

lac shell (everyday commands): llama3:8b or phi3:mini — fast, low memory, accurate enough for shell tasks
lac agent (code changes): deepseek-coder:6.7b or qwen2.5-coder:7b — trained on code, better instruction following
lac gendoc: anything in the 7B range works; mistral:7b is solid and widely cached
lac mind (debates): bigger is better here — llama3:70b if your machine can handle it, otherwise mix with a cloud provider

One practical tip before you go

If you're on a machine with limited RAM, set Ollama to keep only one model loaded at a time. Add this to your shell profile:

export OLLAMA_MAX_LOADED_MODELS=1

Without it, Ollama will try to keep the last few models resident in memory, which is great for switching but rough if you're tight on RAM. One model at a time still loads fast enough that you won't notice the difference in normal use.

The whole point of the lac-cli provider system is that you're not locked in. Cloud when you need the horsepower, local when you want speed, privacy, or just to stop paying per token for things that don't need it. Ollama makes the local side of that equation actually usable — and once it's set up, you'll forget it's even there.