Local inference, full privacy
Load any .gguf and talk to it. The tokenizer and chat template come straight from the file. Nothing is uploaded, nothing is logged to a server — the model lives and runs on your hardware.
- Persistent context with KV-cache reuse for fast multi-turn
- Auto-fit context window with non-destructive summarization
- Full sampling controls & saveable prompt presets