Hey Rama: Building a Voice-First Offline Learning Companion on Raspberry Pi 5

A complete roadmap and offline AI framework guide to build 'Hey Rama' - a private, voice-based learning system for children using Raspberry Pi 5, Ollama, Qdrant, and LangChain.

Posted Nov 2, 2025

By Ujwal Iyer

5 min read

Vision

Hey Rama is a voice-first, completely offline learning system built on Raspberry Pi 5 (16 GB).
It teaches a child topics like Ramayana, Mahabharata, Maths, Logic, Nature, Space, and Storytelling using a local Large Language Model (LLM), speech recognition, and agentic reasoning.
The project helps a child explore knowledge safely, while giving a product manager like myself- a hands-on builder experience with real-world AI orchestration.

Feature Roadmap and User Stories

Each feature is ordered for incremental build-out with clear objectives and finalized tool references.

1. System Setup and Environment

Objective: Prepare a stable, secure platform for offline AI workloads.
Technologies: Raspberry Pi OS 64-bit, Linux, Python, Docker

User Stories

Prepare boot media, power supply, and cooling.
Install Raspberry Pi OS (Bookworm 64-bit).
Configure static IP, SSH, and firewall (UFW).
Create /data directories for models, corpus, and logs.
Validate system health under load.

2. LLM Runtime Setup

Objective: Enable local LLM inference through Ollama.
Technologies: Ollama, GGUF 7B (Llama 3.1 or Mistral), systemd

User Stories

Install Ollama service.
Pull 7B instruction-tuned model (quantized q5_k_m).
Add smaller fallback model (3B) for quick responses.
Verify response times <8 seconds.

3. Knowledge Base and RAG Pipeline

Objective: Build an offline retriever for factual and narrative content.
Technologies: Qdrant, Sentence Transformers, Python

User Stories

Install Qdrant vector database.
Create collections for each topic: Ramayana, Mahabharata, Maths, Logic, Stories, Nature, Space.
Use local embeddings (all-MiniLM-L6-v2).
Ingest chunked, tagged content into Qdrant.
Validate retrieval accuracy, switch to hybrid heirarchical RAG is accuracy is low.

4. Voice Input and Output

Objective: Enable hands-free, real-time interaction.
Technologies: openWakeWord, Whisper.cpp, Piper

User Stories

Calibrate microphone input.
Implement Whisper.cpp for offline STT.
Add wake word “Hey Rama” detection.
Configure Piper for offline child-friendly TTS.
Test full speech loop with <2s total latency.

5. Orchestration and Agent Flow

Objective: Build dynamic flow between input, reasoning, and output.
Technologies: LangChain (local), REST APIs, JSON flows

User Stories

LangChain local orchestration.
Create retrieval-augmented QA chain (RAG).
Add context memory for smoother multi-turn answers.
Map intents: storytelling, quiz, knowledge, logic.
Validate full pipeline: wake → STT → retrieve → LLM → TTS.

6. Topic-Specific Agents

Objective: Make “Hey Rama” intelligent across multiple learning domains.
Technologies: LangChain, Qdrant, Ollama

User Stories

Story Agent - narrates moral and mythological stories.
Quiz Agent - asks short questions and explains answers.
Knowledge Agent - answers “why” and “how” questions.
Each agent retrieves from its domain-specific Qdrant collection.
Test tone consistency for age 5–8.

7. Safety and Governance

Objective: Keep the system factual, respectful, and child-safe.
Technologies: Guardrails AI (offline), local logging

User Stories

Apply Guardrails filters on responses for tone and safety.
Add parental control (time limits and topic access).
Log all interactions for transparency.
Validate safe outputs.

8. Monitoring and Maintenance

Objective: Ensure reliability and observability.
Technologies: systemd, journalctl, Glances

User Stories:

Create systemd units for each service (Ollama, Qdrant, STT, TTS).
Set up health checks and daily restarts.
Log CPU/RAM usage and conversation stats.
Add alerts for service failure.

9. Testing and Demos

Objective: Validate real-life usage.

Sample Scenarios

“Hey Rama, tell me about Hanuman.”
“Quiz me on addition up to ten.”
“Read a story about animals found in Africa.”
“Why is the sky blue?”

Each should complete end-to-end locally, with clear voice and no external connections.

AI & Agentic Frameworks Planned for Implementation

Framework	Purpose	Key Learning
LangChain (Local)	Agent orchestration between Ollama and Qdrant	Chains, retrieval, reasoning
RAGAS	Evaluate retrieval accuracy and relevance	RAG metrics, quality scoring
Whisper.cpp	Offline speech-to-text	Real-time STT optimization
Piper	Offline text-to-speech	Voice synthesis tuning
MemGPT	Add persistent memory to sessions	Conversation recall, personalization
Guardrails AI (Offline Mode)	Enforce tone and safety	Response filtering, schema validation
TruLens (Offline Eval)	Evaluate model helpfulness	Trace and assess reasoning quality

Learning Areas

Area	What to Learn?
LLM Orchestration	LangChain chains, context flow, structured prompts
Retrieval Evaluation	RAGAS, groundedness, recall precision
Conversational Memory	MemGPT local context persistence
Voice UX	Whisper.cpp latency tuning, Piper tone optimization
AI Safety	Guardrails rule definition and validation
Agent Evaluation	TruLens metrics and improvement loop
System Thinking	Offline-first architecture, reliability design

A Product Manager’s Perspective

After over 6 yrs in product management, this project reminded me of what originally drew me to technology- the joy of building something useful from first principles.
“Hey Rama” isn’t just an AI experiment; it became a full product lifecycle in miniature - discovery, design, development, validation, and iteration - all compressed into a single, tangible artifact.
Every component decision - Ollama for edge inference, Qdrant for retrieval, LangChain for orchestration, Whisper and Piper for voice - mirrors the same trade-offs faced in enterprise-scale products: performance vs. usability, innovation vs. reliability, and ambition vs. maintainability.

From a PM’s lens, it represents five core learnings:

Start with a clear outcome - “delight the user” here meant delighting my 5-year-old curious son.
Design with constraints, not despite them - a 16GB Raspberry Pi forces thoughtful scoping. If you have a 4/8GB one, entire design needs to be mapped out from scratch.
Prioritize modularity - each addition (LangChain, MemGPT, RAGAS) had to earn its place.
Measure value, not volume - small, observable wins (a faster answer, a more accurate story) trump over-engineered complexity.
Close the feedback loop - real-world testing with a curious child is better than any synthetic evaluation metric.

In essence, this project bridges AI system design and human-centered product thinking. It demonstrates that being “AI-ready” as a product manager isn’t about memorizing frameworks- it’s about understanding the why behind each layer of intelligence you introduce.

It’s the same muscle we use in large organizations, just exercised in a sandbox of pure creativity.

A Father’s Perspective

As a father, “Hey Rama” became something deeper - a bridge between my world of technology and my child’s world of imagination.

I am waiting for the day when my 5-year-old says “Hey Rama, tell me about Hanuman” and hearing a gentle, local voice respond- without screens, ads, or distractions - will sure be pretty satisfying!

Every late-night debugging session would be worth it, because it wasn’t just about code or AI- It was about showing my child what curiosity looks like in action - and how we can build our own tools instead of consuming someone else’s.

AI Projects, Raspberry Pi, LLM, Offline AI, Education, Product Management

This post is licensed under CC BY 4.0 by the author.

Vision

Feature Roadmap and User Stories

1. System Setup and Environment

2. LLM Runtime Setup

3. Knowledge Base and RAG Pipeline

4. Voice Input and Output

5. Orchestration and Agent Flow

6. Topic-Specific Agents

7. Safety and Governance

8. Monitoring and Maintenance

9. Testing and Demos

AI & Agentic Frameworks Planned for Implementation

Learning Areas

A Product Manager’s Perspective

A Father’s Perspective

Trending Tags