Post

Unpacking toddler persona & AI-ML solutions for an AI Voice Bot

How a Dad, who's a Product Manager is learning AI hands-on by building a safe, voice-only LLM companion for his child - blending parenting, product thinking, and technology.

Unpacking toddler persona & AI-ML solutions for an AI Voice Bot

As a Senior Product Manager, I’ve spent years obsessing over user personas, friction points, and success metrics. But nothing prepared me for my newest and toughest customer: my five-year-old toddler.

He doesn’t just ask questions. He fires rapid, chaotic queries at bedtime.
“Appa, why Rama go to forest?” (where to start- because his Dasharatha told him so, or because he promised Kaikeyi that he will grant her boon when the time came?) “Why did Krishna help Arjuna but not Duryodhana?” (tough to explain righteousness/dharma)
“Why gas fire blue but diya fire red?” (I still don’t know!)

This blog will focus on pure user research, and suggest AI frameworks and subsequent solutions of the pain point faced by me.


1. Persona Deep Dive: “The Curious toddler”

Who he is

  • Age: 5 years
  • Context: bilingual (English + Tamil)
  • Hardware: Raspberry Pi 5, 16GB
  • Interface: Voice only - no screen, no keyboard

Behavior Traits

  • Speaks in short, broken english
  • Has a 20-second patience window hence easily distracted - responses longer than 20 seconds are lost
  • Emotionally reactive - cannot have wrong tone or scary words

2. Jobs-to-be-Done + Pain/Gain Matrix

JobCurrent WorkaroundPain (1–5)Gain if Solved (1–5)
Understand epic storiesParents explain45
Ask “why” questions safelyYT Kids/Khan Academy35
Stay engaged for short burstsCartoons54
Speak naturallySpeech-to-text confusion45

3. Friction Heatmap

StageFrictionSeverity (1–5)Fix Priority
Voice to TextBroken grammar5High
Query UnderstandingNeeds semantic rewriting5High
Answer GenerationToo long / too abstract4High
Response FilteringUnsafe/adult/violent content5Critical
Text-to-SpeechMust sound friendly & child-like3Medium

4. Success Metrics

  • Attention Retention: responses under 20 seconds
  • Understanding Score: answers correctly paraphrased by child
  • Safe Response Ratio: age-safe answers
  • Latency: TTS response delivered under 2 seconds

5. Technical Blueprint: AI Concepts to solve Friction & Jobs to be done

Here’s the system I designed for now:

Speech-to-Text Cleanup

  • Handled by: SymSpell
  • Why: Fast spelling and grammar correction

Prompt Rewriting

  • Handled by: Llama 3.1 7B (Quantized)
  • Why: Reformulates prompts so chunked vector matches correctly

Vectorize Content

  • Handled by: ONNX MiniLM
  • Why: Transformer-based embeddings

Storage / Retrieval

  • Handled by: Qdrant
  • Why: Scalable vector DB and similarity search

Answer Generation

  • Handled by: Llama 3.1 7B (again)
  • Why: Generates short, story-like responses

Kiddy Guardrails

  • Handled by: LlamaGuard
  • Why: Filters complex or violent content before TTS

Text-to-Speech (TTS)

  • Handled by: Coqui TTS or System.Speech
  • Why: Produces warm, human-like voice output

Each component runs locally on the Raspberry Pi, respecting privacy and keeping everything offline.
No cloud calls, no hidden data drift - just pure, safe, curious AI for a curious child.


6. Why .NET Makes It Possible

I chose .NET not just out of nostalgia, but also because the AI ecosystem quietly supports all of this:

  • STT + TTS → System.Speech namespace or Coqui integration via C#
  • Qdrant Vector DB → Official .NET client with REST bindings
  • ONNX Runtime → Runs MiniLM embeddings natively on CPU
  • Llama 3.1 7B (GGUF) → Compatible via llama.cpp bindings for .NET
  • SymSpell.NET → Lightweight typo and grammar fixer
  • LLamaSharp → C# binding for llama.cpp so can load LlamaGuard model

Watch this space

I’ll be sharing the step-by-step build of this AI Toddler Explorer soon - from RAG pipelines to voice tuning - all on a humble Raspberry Pi 5.
Follow my journey at ujwaliyer.com - where bedtime questions meet machine learning.

This post is licensed under CC BY 4.0 by the author.