WebGPU LLM In The Browser

N0X runs open-source chat models directly in supported browsers through WebGPU and MLC WebLLM. Models download once, cache locally, and stream responses without an account or hosted inference backend.

Private by default

Worker-based inference

Tiny model path

Try WebGPU Chat