LocalAI
Overview
LocalAI is a private single-user chat application built for running large language models entirely on local hardware. It operates in two isolated modes — Local, where GGUF models run via a CUDA-accelerated llama-server process managed by the backend, and Cloud BYOK, where you connect any OpenAI-compatible API using your own keys stored encrypted on the server. The app delivers a polished chat experience with streaming responses, conversation history, per-chat system instructions, model and provider switching, a context token meter, light and dark themes, and Markdown rendering. Built on FastAPI and React 19, with conversations stored as plain JSON files on disk — no database, no accounts, fully self-contained. Designed for daily use on a Windows machine with an NVIDIA GPU.
Features
- Two fully isolated chat modes — Local (GGUF via CUDA llama-server) and Cloud BYOK (any OpenAI-compatible API) — with separate conversation lists, engines, and provider management.
- Backend-managed llama-server lifecycle — one command starts FastAPI and launches the inference process automatically; model switching kills and restarts the process without manual intervention.
- BYOK cloud provider system with encrypted API key storage at rest (Fernet), provider CRUD, connection testing, and per-conversation provider mismatch warnings.
- Per-chat system instructions stored as conversation metadata — not shown as transcript bubbles, included in every API call, editable at any time without breaking conversation history.
- Streaming chat with Stop control, thinking indicator, floating copy pill, scroll-to-bottom button, and character-by-character animation for smooth output rendering.
- Context token meter with color-coded progress bar (local mode), counting instruction tokens alongside message history.
- Conversations saved as JSON on disk — persistent across restarts, with inline rename, delete confirmation, and model/provider mismatch modal on reopen.
- Claude-inspired layout with 750px chat column, light/dark theme via CSS variables and localStorage with no flash on load.