Hosting a SLM Qwen 2.5 on Raspberry Pi 2

December 8, 2025 - By Praveen Nair

This article is a write up of my experience on hosting Qwen 2.5 the 0.5B model in Raspberry Pi 2 using Llama.cpp. Qwen 2.5 is one of the small SLM with 0.5B parameters so a small development board like Raspberry Pi can hold it. RPi 2 Model B comes with 900Mhz speed and only 1GB of memory.

But to be honest, setting up the project might take 1-2 hours, and the prompt execution is only some 1-2 tokens per second. So you need to be patient.

Let us begin!

Step 0: Pick up the Raspberry Pi 2 from the attic. Just kidding. Any version including zero should work!

Step 1: Load the new OS – Raspberry Pi OS (previously Raspbian) using the official Raspberry Pi Imager.

Step 2: Make sure the board boots up smooth with the new OS

Ok all set!.

Step 3: Do some pre-requisites installations and updates

sudo apt update && sudo apt upgrade –y

sudo apt install git g++ build-essential make wget cmake

Step 4: Setup llama.cpp

The main goal of llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a wide range of hardware – locally and in the cloud.

git clone https://github.com/ggerganov/llama.cpp

cd llama.cpp

Step 5: CMake the project

cmake ..

cmake –build . –config Release -j 2

(you might encounter some issues with dependencies. You have to either disable it, install it.

eg. sudo apt install libcurl4-openssl-dev –y

Note: This is the most slow phase!

Step 6: Get the brain – Qwen

wget https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct-GGUF/resolve/main/qwen2.5-0.5b-instruct-q4_k_m.gguf

Step 7: Action!

I have a simple prompt “Write a short story about a cat”

./build/bin/llama-cli -m qwen2.5-0.5b-instruct-q4_k_m.gguf -p “Us
er: Write a short story about a cat.\nAssistant:” -c 512 -n 128

Watch this video clip to experience the response generation speed:

You must be logged in to post a comment.

Related Posts

My journey of building an AI-Powered App for a pseudoscience project

Whitepaper: The Deterministic AI Agent: A ‘Dual-Brain’ Architecture for Zero-Error Workflows for FinServ & Healthcare

The “Stochastic Parrot” problem and why it still matters in AI system design

Leave a Reply