Open Source
Explore the latest AI open-source projects from GitHub and HuggingFace.
Explore the latest AI open-source projects from GitHub and HuggingFace.
picolm is an open-source C library from RightNow-AI that demonstrates running a 1-billion-parameter language model on a $10 microcontroller-class board with only 256MB of RAM. Released under the MIT license in February 2026, the project has crossed 1,600 GitHub stars and 200 forks in three months, becoming a reference point for what is possible at the extreme low end of LLM inference. It targets ARM, RISC-V, and Raspberry Pi class hardware where most modern inference stacks simply will not load. ## The Memory Problem A naive 1B-parameter model in FP16 weights occupies 2GB of memory, which is eight times the RAM budget picolm aims for. The project closes that gap with aggressive integer quantization, streaming weight loading from flash storage, and a hand-tuned inference loop written in portable C with zero heap allocations during generation. The result is a system that can keep working set memory below 256MB while still executing a 1B-parameter transformer end-to-end, with realistic tokens-per-second numbers for embedded scenarios where any local LLM at all is currently impossible. ## Why Portable C Matters Most open inference engines assume a Linux userland, glibc, and either CUDA or a heavyweight Python runtime. picolm assumes none of those. The codebase is written in straight portable C with no external dependencies beyond the C standard library, which makes it cross-compilable to bare-metal ARM Cortex-M, microcontrollers like the ESP32, single-board computers in the Raspberry Pi Zero class, and emerging RISC-V dev boards that ship without an operating system at all. For hobbyist and education use cases, this opens up LLM inference on hardware that costs less than a coffee. ## What You Can Build The target use cases sit at the intersection of edge AI and maker hardware. Offline voice assistants on $10 boards, smart sensor nodes that can summarize observations into text without a network, classroom kits that let students experiment with transformer inference on hardware they can afford, and air-gapped industrial controllers where any network dependency is forbidden. picolm is also explicitly framed as an educational artifact: the source is small enough to read end-to-end in an afternoon, which is genuinely rare among modern LLM engines. ## Architecture and Quantization The inference pipeline uses sub-byte integer quantization for weights, with a tiny activation cache sized to fit the target board. Matrix-multiply kernels are hand-tuned per architecture, with separate code paths for ARMv6, ARMv7, ARMv8, and RV32. Weights are streamed from flash on demand rather than being fully resident, which is the trick that makes a 1B model possible inside 256MB of RAM. The trade-off is throughput: this is not a Mac Studio inference engine and was never trying to be one. ## Limitations picolm is a proof of concept of what is possible at the extreme edge, not a production inference stack. Tokens per second on a $10 board are measured in single digits, which is fine for short summarization or trigger-word voice tasks but not for interactive chat. Model quality is also bounded by the 1B-parameter ceiling and the aggressive quantization, so expectations should match capability. The project does not yet ship official bindings for popular embedded RTOSes like Zephyr or FreeRTOS, although the dependency-free C code is straightforward to drop into either. As with any sub-byte quantized model, perplexity on benchmark tasks is meaningfully worse than the FP16 baseline.