Local model

You don’t need an external assistant to get AI replies during a call. StageWhisper can download a model and run it on your own Mac, so your calls get answered on-device. No account, no API key, and nothing leaves your machine. Reach for it when you want a private option that works offline, or a backup for when the assistant you normally connect to isn’t around.

When the local model answers

As a fallback. If you have an external assistant connected (OpenClaw or Hermes), that assistant stays in charge. The local model only steps in when the assistant is offline or not set up, then hands back once it’s available again.
As your default. Flip one toggle and the local model answers everything, even with an external assistant connected.

Set it up

Open Settings → Local Model

You’ll see two recommended models, a field for any Hugging Face model, and an option to point at model files you already have.

Get a model onto your machine

Download the recommended model, paste a Hugging Face repo id, or choose a folder you already have. The download runs in the background, so you can keep working while it finishes.

Wait for Installed

Once the model shows Installed, it’s ready to answer.

Optional: make it your default

Turn on “Use the local model as my default” if you want it to handle every reply. Leave it off to keep it as a fallback behind your external assistant.

Choosing a model

The recommended models come in two sizes, both ungated, so there’s nothing to sign up for and no license to accept. The smaller one is a light download that runs on a modest machine. The bigger one wants more memory but tends to answer better, so reach for it if your machine has the room. Any Hugging Face model. Paste a GGUF repo id (for example unsloth/Qwen3.5-4B-GGUF) and download it. For gated or private repos, add an access token in the same panel. A model you already have. If you’ve already downloaded GGUF files, whether with the Hugging Face CLI, another local app, or an earlier StageWhisper download, choose Use a model from your computer and point at the folder. Nothing downloads and no token is needed.

Models must be in GGUF format. Pick a folder that has a .gguf file inside, or a Hugging Face repo that ships GGUF files. Models kept in Ollama’s own format will not work here.

Privacy

When the local model answers, the reply is generated on your machine. Your transcript isn’t sent to us or to any external assistant. The model runs through an open-source on-device engine, so all the work happens locally on your GPU. See privacy for the full breakdown of what does and doesn’t leave your computer.

Requirements

macOS 14 (Sonoma) or later, Apple Silicon.
A few GB of free disk and enough memory for the model you pick. The smaller recommended model is light on both; the bigger one needs more room.
No account or token, and no network once the model is downloaded.

Prefer to connect your own assistant?

The local model is just one option. You can also bring a full assistant that already has your memory and tools.

Connect OpenClaw

Pair an OpenClaw assistant and let it act during the conversation.

Connect Hermes

Pair a Hermes agent to follow your call and respond in real time.

Getting started

Features

Use cases

AI agents

Account

Product

When the local model answers

Set it up

Choosing a model

Privacy

Requirements

Prefer to connect your own assistant?

Connect OpenClaw

Connect Hermes

​When the local model answers

​Set it up

​Choosing a model

​Privacy

​Requirements

​Prefer to connect your own assistant?

Connect OpenClaw

Connect Hermes

When the local model answers

Set it up

Choosing a model

Privacy

Requirements

Prefer to connect your own assistant?