Runs 100% on your machine

Powerful AI that
never phones home.

Chaty loads any .gguf model and runs it locally — chat, a private knowledge base, multi-round Deep Research, and voice. No account, no cloud, no telemetry. Your conversations stay on your disk.

Download for MacApple Silicon · .dmg Download for Windowsx64 installer · .exe

Free & open source · All releases ↗

Chaty Qwen3.5 · 14B

How does the retrieval pipeline rank chunks?

Thought for 3s

It runs hybrid retrieval — dense bge-m3 vectors plus BM25 keywords, fused with RRF, then MMR-diversified.1

Neighbor chunks are pulled in for context, and answers stay grounded in your sources.2

notes.pdf · p.4spec.md

Knowledge baseDeep Research

Message Chaty…

100% offline-capable
0 accounts & logins
No telemetry, ever
Open source

Runs Llama 3 Gemma 3 / 4 Qwen 3 · 3.5 · 3.6 + any GGUF

What's inside

A full AI workstation, contained on your device.

Everything a cloud chatbot does — and several things it can't — without a single byte leaving your computer.

Local inference, full privacy

Load any .gguf and talk to it. The tokenizer and chat template come straight from the file. Nothing is uploaded, nothing is logged to a server — the model lives and runs on your hardware.

Persistent context with KV-cache reuse for fast multi-turn
Auto-fit context window with non-destructive summarization
Full sampling controls & saveable prompt presets

GPU-accelerated

Cross-vendor Vulkan on Windows and Metal on Apple Silicon — auto-tuned to your VRAM, with a graceful CPU fallback when it won't fit.

Local knowledge base

Index PDFs, docs, code & images into a private store. Hybrid retrieval, strict grounding, and inline citations you can hover to verify.

Deep Research

Give it a topic; it runs multiple rounds of web search interleaved with reasoning, then writes a long, cited report — exportable to PDF or Markdown.

Voice & Live mode

Speak and listen — local STT & TTS with 11 voices, silence auto-send, and a hands-free Live mode. Runs on CPU, never touching the model's VRAM.

A chat UI that respects the work

A foldable <think> panel that follows the reasoning as it streams, KaTeX math, Mermaid diagrams, a playable HTML preview, per-block code copy, full-text search and Markdown/JSON export.

Light · dark · system themes
System tray & global hotkey
NotebookLM-style audio deep-dive
EN · 简体中文

Knowledge base · RAG

Answers grounded in your documents.

Drop in PDFs, Markdown, code, even scanned images — Chaty chunks and embeds them locally with multilingual bge-m3 vectors. When the knowledge base is on, the model answers only from what it retrieved, and says so plainly when something isn't covered.

Hybrid dense + keyword retrieval, fused and diversified
Inline citations with hover-preview of the source passage
Per-document scope — pick exactly which files are searched
Turn a knowledge base into a two-host audio deep-dive

Deep Research

From a question to a cited report.

Hand Chaty a topic and watch it work: it plans queries, runs several rounds of web search interleaved with its own reasoning about what's still missing, then synthesizes a structured long-form report. References list only the sources it actually cited — and you can export the whole thing to PDF.

Topic-anchored so results never drift off subject
Free, key-less search — resilient multi-provider chain
Live progress: planning → searching → reasoning → writing
One-click export to PDF or Markdown

your machine

Privacy is the architecture

Your data never leaves your device.

This isn't a setting you toggle — it's how Chaty is built. The model, your chats, your documents, and your knowledge base all live in local storage on your computer. There's no account to create and no server to trust.

The only time the network is touched is when you ask for it: optional web search, or a one-time model download. Turn those off and Chaty is fully offline.

Platforms

Native on the hardware you already own.

macOS

Chip: Apple Silicon (M-series)
GPU: Metal · offload-all on unified memory
Package: Signed .dmg

Windows

Arch: x64 · Windows 10 / 11
GPU: Vulkan · cross-vendor, auto-tuned
Package: Per-user installer · no admin

Under the hood

Engine: llama.cpp · Rust
Shell: Tauri 2 · React
Storage: Local SQLite + vector store

Bring the model home.

Free, open source, and yours to run forever. Pick your platform — the download tracks the latest release.

Download for MacApple Silicon · .dmg Download for Windowsx64 installer · .exe

macOS is ad-hoc signed (not notarized). First launch needs one Gatekeeper step — see the FAQ.

Questions

Good to know.

Is Chaty really free?

Yes — Chaty is free and open source. You bring your own GGUF models (downloadable in-app from Hugging Face), and everything runs on your own machine. There's no subscription and no account.

Does it work fully offline?

Once you've downloaded a model, yes. The network is only used when you explicitly enable web search / Deep Research, or to fetch a model or the optional voice and embedding files. Disable those and Chaty never touches the internet.

Which models can I run?

Any .gguf file. There's first-class handling for Llama 3, Gemma 3 / 4, and Qwen 3 / 3.5 / 3.6 (including thinking control), plus a robust template fallback chain so unusual community models still chat.

What are the hardware requirements?

A 64-bit Windows 10/11 PC or an Apple-Silicon Mac. Smaller quantized models run comfortably in 8 GB of RAM; larger models want more. Chaty auto-tunes GPU offload to your VRAM and refuses models that can't physically fit, so it won't freeze your system.

On macOS it says the app "can't be verified." What do I do?

Chaty is ad-hoc signed but not notarized (there's no paid Apple Developer account behind it). Clear the download quarantine once in Terminal:

xattr -dr com.apple.quarantine /Applications/Chaty.app

Then open it normally — or right-click the app, choose Open, and confirm via System Settings → Privacy & Security → Open Anyway.

Where are my conversations stored?

In a local SQLite database in your user app-data folder. Nothing is synced anywhere. Delete the app data and it's gone — it was only ever on your disk.

Powerful AI thatnever phones home.

A full AI workstation, contained on your device.

Local inference, full privacy

GPU-accelerated

Local knowledge base

Deep Research

Voice & Live mode

A chat UI that respects the work

Answers grounded in your documents.

From a question to a cited report.

Your data never leaves your device.

Native on the hardware you already own.

macOS

Windows

Under the hood

Bring the model home.

Good to know.

Powerful AI that
never phones home.