mullama
/about

About Mullama

Mullama is a local LLM runtime built on llama.cpp that runs GGUF models — Llama 3.2, Qwen 2.5, DeepSeek R1, Mistral, Phi 3, Gemma 2, LLaVA, and any other GGUF file from Hugging Face — directly inside your application.

The problem we built it to solve

Local LLM tooling, today, tends to force you into one of two cul-de-sacs:

  1. Daemon-only. You install a separate process, speak HTTP to it, and pay the round-trip cost — even when "across the network" is just your own machine. Your Node app spawns a Python subprocess just so it can do RAG.
  2. Single-language SDK. The bindings exist only in the language the maintainer happens to write. Everyone else either calls the C library through hand-rolled FFI or gives up and uses the HTTP daemon anyway.

Mullama refuses the choice. It ships one Rust core with idiomatic, first-party bindings for Rust, Python, Node.js, Go, PHP, and C/C++, and it speaks the same HTTP surface as Ollama on port 11434 — plus an Anthropic-compatible API that Ollama doesn't have. Embed it, serve it, or both.

What's in the box

What it isn't

Who it's for

Design principles

Project & links