Unpacking toddler persona & AI-ML solutions for an AI Voice Bot

How a Dad, who's a Product Manager is learning AI hands-on by building a safe, voice-only LLM companion for his child - blending parenting, product thinking, and technology.

Posted Nov 6, 2025

By Ujwal Iyer

3 min read

As a Senior Product Manager, I’ve spent years obsessing over user personas, friction points, and success metrics. But nothing prepared me for my newest and toughest customer: my five-year-old toddler.

He doesn’t just ask questions. He fires rapid, chaotic queries at bedtime.
“Appa, why Rama go to forest?” (where to start- because his Dasharatha told him so, or because he promised Kaikeyi that he will grant her boon when the time came?) “Why did Krishna help Arjuna but not Duryodhana?” (tough to explain righteousness/dharma)
“Why gas fire blue but diya fire red?” (I still don’t know!)

This blog will focus on pure user research, and suggest AI frameworks and subsequent solutions of the pain point faced by me.

1. Persona Deep Dive: “The Curious toddler”

Who he is

Age: 5 years
Context: bilingual (English + Tamil)
Hardware: Raspberry Pi 5, 16GB
Interface: Voice only - no screen, no keyboard

Behavior Traits

Speaks in short, broken english
Has a 20-second patience window hence easily distracted - responses longer than 20 seconds are lost
Emotionally reactive - cannot have wrong tone or scary words

2. Jobs-to-be-Done + Pain/Gain Matrix

Job	Current Workaround	Pain (1–5)	Gain if Solved (1–5)
Understand epic stories	Parents explain	4	5
Ask “why” questions safely	YT Kids/Khan Academy	3	5
Stay engaged for short bursts	Cartoons	5	4
Speak naturally	Speech-to-text confusion	4	5

3. Friction Heatmap

Stage	Friction	Severity (1–5)	Fix Priority
Voice to Text	Broken grammar	5	High
Query Understanding	Needs semantic rewriting	5	High
Answer Generation	Too long / too abstract	4	High
Response Filtering	Unsafe/adult/violent content	5	Critical
Text-to-Speech	Must sound friendly & child-like	3	Medium

4. Success Metrics

Attention Retention: responses under 20 seconds
Understanding Score: answers correctly paraphrased by child
Safe Response Ratio: age-safe answers
Latency: TTS response delivered under 2 seconds

5. Technical Blueprint: AI Concepts to solve Friction & Jobs to be done

Here’s the system I designed for now:

Speech-to-Text Cleanup

Handled by: SymSpell
Why: Fast spelling and grammar correction

Prompt Rewriting

Handled by: Llama 3.1 7B (Quantized)
Why: Reformulates prompts so chunked vector matches correctly

Vectorize Content

Handled by: ONNX MiniLM
Why: Transformer-based embeddings

Storage / Retrieval

Handled by: Qdrant
Why: Scalable vector DB and similarity search

Answer Generation

Handled by: Llama 3.1 7B (again)
Why: Generates short, story-like responses

Kiddy Guardrails

Handled by: LlamaGuard
Why: Filters complex or violent content before TTS

Text-to-Speech (TTS)

Handled by: Coqui TTS or System.Speech
Why: Produces warm, human-like voice output

Each component runs locally on the Raspberry Pi, respecting privacy and keeping everything offline.
No cloud calls, no hidden data drift - just pure, safe, curious AI for a curious child.

6. Why .NET Makes It Possible

I chose .NET not just out of nostalgia, but also because the AI ecosystem quietly supports all of this:

STT + TTS → System.Speech namespace or Coqui integration via C#
Qdrant Vector DB → Official .NET client with REST bindings
ONNX Runtime → Runs MiniLM embeddings natively on CPU
Llama 3.1 7B (GGUF) → Compatible via llama.cpp bindings for .NET
SymSpell.NET → Lightweight typo and grammar fixer
LLamaSharp → C# binding for llama.cpp so can load LlamaGuard model

Watch this space

I’ll be sharing the step-by-step build of this AI Toddler Explorer soon - from RAG pipelines to voice tuning - all on a humble Raspberry Pi 5.
Follow my journey at ujwaliyer.com - where bedtime questions meet machine learning.

AI Projects, Product Management, Raspberry Pi, Learning by Building

This post is licensed under CC BY 4.0 by the author.