I got small LLMs running nicely on a Raspberry Pi
A wrapper plus web UI around llama.cpp tuned for single board computers like the Raspberry Pi. It streams tokens over websockets and stays under the Pi's tiny memory budget using Q4 quantization and mmap, so a 3B model actually fits in 2GB. Tokens per second are slow, but for a home assistant that mostly answers short questions it is real-time enough to feel responsive. I wanted something that ran entirely on hardware I own: no API keys, no usage caps, and no sending my house's voice commands off to someone's cloud. Setup is a single script that auto-detects the board. Local Whisper for voice input is the next milestone.
Q4 quantization plus mmap got a 3B model running in 2GB. Tokens per second are slow but real-time enough for voice.
I made a status page and incident timeline you actually own
I made a minimalist forum that is just markdown and links
I made offline meeting transcription that never leaves your laptop
I made self-hosted feature flags without the enterprise pricing