n0x
Zero backend · Zero API keys · 100% private

The full AI stack.
In your browser.

LLM inference, autonomous agents, document RAG, code execution, image generation — running entirely on your GPU. No server. No account. Your data never leaves your machine.

40
open-source models
360MB
minimum VRAM
0
API keys required
100%
runs in-browser
STREAMING AGENT THOUGHTS

Autonomous ReAct Agent

A full reasoning loop running entirely in your browser. The LLM thinks, picks tools, executes them, reads results, and iterates — with every thought streaming live token-by-token. Watch the model reason in real time. No server. No API. Pure WebGPU autonomy.

Live thought streaming Multi-tool orchestration Per-step trace UI Loop detection + OOM protection

WebGPU Inference

Direct-to-metal execution via MLC WebLLM. 40 open-source models from 360MB to 70B — downloaded once, cached in your browser forever.

Llama 3.3 70BDeepSeek R1 70BQwen 2.5 32BMistral 7BQwen 0.5B+35 more

Zero Tracking

No server processes your data. Prompts, documents, and memory live in IndexedDB on your device. Disable optional search/image hooks for a fully air-gapped runtime.

Document RAG

Drop PDFs, DOCX, CSVs, or text files. Sentence-boundary chunking with 50% overlap, MiniLM embeddings, and MMR reranking for diverse, accurate retrieval — all in a Web Worker.

PDFDOCXTXTMDCSVJSON

Conversation Branching

NEW

Hover any message and click the branch icon to fork the conversation from that exact point. Explore alternative directions without losing your original thread. Branches are saved automatically.

More Capabilities

Python SandboxPyodide WASM runtime
Deep SearchDDG + Tavily synthesis
Image GenFlux / Stable Horde
Voice I/OSTT + TTS native
Persistent MemoryIndexedDB long-term
5 PersonasEngineer · Writer · Tutor…

Ready to run AI locally?

No sign-up. No API keys. Just open the app, pick a model, and start.

Launch n0x