LocalAI

Current stateIn development
StageLate

Overview

LocalAI is a private single-user chat application built for running large language models entirely on local hardware. It operates in two isolated modes — Local, where GGUF models run via a CUDA-accelerated llama-server process managed by the backend, and Cloud BYOK, where you connect any OpenAI-compatible API using your own keys stored encrypted on the server. The app delivers a polished chat experience with streaming responses, conversation history, per-chat system instructions, model and provider switching, a context token meter, light and dark themes, and Markdown rendering. Built on FastAPI and React 19, with conversations stored as plain JSON files on disk — no database, no accounts, fully self-contained. Designed for daily use on a Windows machine with an NVIDIA GPU.

Features

  • Two fully isolated chat modes — Local (GGUF via CUDA llama-server) and Cloud BYOK (any OpenAI-compatible API) — with separate conversation lists, engines, and provider management.
  • Backend-managed llama-server lifecycle — one command starts FastAPI and launches the inference process automatically; model switching kills and restarts the process without manual intervention.
  • BYOK cloud provider system with encrypted API key storage at rest (Fernet), provider CRUD, connection testing, and per-conversation provider mismatch warnings.
  • Per-chat system instructions stored as conversation metadata — not shown as transcript bubbles, included in every API call, editable at any time without breaking conversation history.
  • Streaming chat with Stop control, thinking indicator, floating copy pill, scroll-to-bottom button, and character-by-character animation for smooth output rendering.
  • Context token meter with color-coded progress bar (local mode), counting instruction tokens alongside message history.
  • Conversations saved as JSON on disk — persistent across restarts, with inline rename, delete confirmation, and model/provider mismatch modal on reopen.
  • Claude-inspired layout with 750px chat column, light/dark theme via CSS variables and localStorage with no flash on load.

Tech stack

PythonFastAPIReactVitellama.cppCUDAGGUFTypeScriptFernetWindows