Running lac-cli Completely Offline with Ollama

The case for running local

Most of the time I reach for Claude or GPT when using lac-cli because the output quality is just better for complex tasks. But there are real situations where sending every shell command and code file to an external API is the wrong call — client work under an NDA, an air-gapped dev machine, or just wanting to prototype something without burning tokens.

The good news is that lac shell and lac agent both treat Ollama as a first-class provider. You flip the config once and everything keeps working exactly the same way. Ghost text, PlanMode, project memory — all of it, running entirely on your hardware.

Get Ollama running first

If you don't have Ollama installed yet, grab it from ollama.com. It runs a local server on http://localhost:11434 and handles model downloads, inference, and the OpenAI-compatible API endpoint that lac-cli talks to.

Pull a model before you do anything else. For shell work, I find llama3 or mistral responsive enough on an M-series Mac. For coding tasks with lac agent, deepseek-coder or codellama are worth the extra download size:

ollama pull llama3
ollama pull deepseek-coder

Once that finishes, make sure the server is actually running:

ollama serve

Leave that terminal open, or set Ollama to start on login — it needs to be up for lac-cli to reach it.

Pointing lac-cli at Ollama

If you already have lac-cli installed (pip install lac-cli or the install script), run the setup wizard again:

lac shell --setup

The wizard will ask which provider you want to use. Pick ollama. It'll then ask for the model name — type whatever you pulled, like llama3 or deepseek-coder. No API key prompt, because there isn't one. The config gets written to ~/.lac/config.json and that's it.

If you want to inspect or edit the config directly, it looks like this:

{
  "provider": "ollama",
  "model": "deepseek-coder",
  "ollama_base_url": "http://localhost:11434"
}

The ollama_base_url field is there if you're running Ollama on a different machine on your local network — a beefy desktop you want to offload inference to, for example. Just swap in that machine's IP.

Using lac shell offline

Once the config is set, you can also pass --offline explicitly when launching the shell if you want to make sure it never tries to reach an external provider:

lac shell --offline

From there it works identically to the cloud-backed version. Type a plain English description of what you want to do and lac shell figures out the command. Ghost text autocomplete fills in as you type. Tab accepts it. You confirm before anything runs.

The latency is higher than a cloud API on a fast connection, especially for the first token. On my M2 MacBook Pro with llama3, there's roughly a one second pause before ghost text starts appearing. Annoying once, fine after that. On deepseek-coder it's a bit slower but the output is noticeably sharper for anything code-adjacent.

lac agent with a local model

This is where it gets more interesting. lac agent reads and writes files, tracks tasks, manages project memory in .lac-memory.json, and runs an HTTP request runner — none of that involves the model provider directly. The model just handles reasoning. So switching to Ollama doesn't break any of the agentic machinery.

Start it the same way you always would:

lac agent

PlanMode still works. Give it a task, it'll think through steps before touching any files. You can still undo changes and preview diffs before accepting them. The HTTP runner still fires requests and shows you the response. Project memory still loads context from .lac-memory.json at the start of every session.

Where you'll notice the difference is on longer, more open-ended tasks. A local 7B or 8B model is not going to match GPT-4 on architecture decisions or subtle refactoring. For well-scoped tasks — "add input validation to this function", "write tests for this module", "rename this variable across the project" — it does fine. For "redesign this entire auth flow", you'll want a bigger model or a cloud provider.

That said, I've been pleasantly surprised by deepseek-coder on focused coding tasks. It stays on topic and doesn't hallucinate file paths the way some models do.

Switching back and forth

You don't have to commit. The config is just a JSON file, and running lac shell --setup at any point re-runs the provider wizard. Switching from Ollama back to Claude or OpenAI takes about ten seconds. I keep Ollama set as default on the work machine and switch to Claude when I hit something that needs heavier reasoning.

You can also maintain separate configs for separate projects by placing a .lac/config.json inside the project directory — lac-cli checks the local config first before falling back to ~/.lac/config.json. So one project can run on deepseek-coder and another can use Claude, without you touching anything between sessions.

What it actually costs

Zero dollars, once you have the hardware. Ollama is free, the models are free to pull and run locally, lac-cli is open source under MIT. If you have a machine with decent RAM (16GB minimum, 32GB preferred for the larger models), the whole setup costs nothing to run indefinitely.

For teams, this is worth thinking about too. You could run Ollama on a shared server, point everyone's ollama_base_url at it, and get a self-hosted AI dev environment with no per-seat API costs and no data leaving your network.

Quick start summary

Install Ollama and run ollama pull llama3 or ollama pull deepseek-coder
Run ollama serve to start the local server
Run lac shell --setup, pick ollama, enter the model name
Use lac shell --offline to launch with the local model explicitly
Edit ~/.lac/config.json directly to set a custom ollama_base_url for a remote machine
Drop a .lac/config.json inside any project folder to override the global provider just for that project

If you haven't installed lac-cli yet: pip install lac-cli or grab the install script from lacai.io/lac-cli. The whole setup from scratch takes about ten minutes, most of which is the model download.