The full AI stack.
In your browser.
LLM inference, autonomous agents, document RAG, code execution, image generation — running entirely on your GPU. No server. No account. Your data never leaves your machine.
Autonomous ReAct Agent
A full reasoning loop running entirely in your browser. The LLM thinks, picks tools, executes them, reads results, and iterates — with every thought streaming live token-by-token. Watch the model reason in real time. No server. No API. Pure WebGPU autonomy.
WebGPU Inference
Direct-to-metal execution via MLC WebLLM. 40 open-source models from 360MB to 70B — downloaded once, cached in your browser forever.
Zero Tracking
No server processes your data. Prompts, documents, and memory live in IndexedDB on your device. Disable optional search/image hooks for a fully air-gapped runtime.
Document RAG
Drop PDFs, DOCX, CSVs, or text files. Sentence-boundary chunking with 50% overlap, MiniLM embeddings, and MMR reranking for diverse, accurate retrieval — all in a Web Worker.
Conversation Branching
NEWHover any message and click the branch icon to fork the conversation from that exact point. Explore alternative directions without losing your original thread. Branches are saved automatically.
More Capabilities
Ready to run AI locally?
No sign-up. No API keys. Just open the app, pick a model, and start.
Launch n0x