n0x

WebGPU LLM In The Browser

N0X runs open-source chat models directly in supported browsers through WebGPU and MLC WebLLM. Models download once, cache locally, and stream responses without an account or hosted inference backend.

Private by default
Worker-based inference
Tiny model path
Try WebGPU Chat