Phone Bidet — contest state heading into Thursday recording

Built for the bus. Read this, then record.

2026-05-12 ET · Kaggle Gemma 4 Good Hackathon · deadline 2026-05-18 23:59 UTC · video target 2026-05-14 (Thursday after school)

1 · Elevator pitch (first 15 seconds of the video)

Phone Bidet is a 100% on-device Android brain-dump cleaner. You hit Record. Moonshine-Tiny captures speech on the Tensor G3 CPU. Gemma 4 E2B cleans it — fixes the half-formed sentences, the repeats, the proper nouns the STT mangled — without ever leaving the phone. A deterministic sanitizer + regex canonicalizer pre-corrects known mishears before Gemma sees them, so a single bad token in the audio model can't poison the output. Airplane mode is fine. There is no cloud. The model lives in your pocket.

Routing: mic → Moonshine-Tiny (sherpa-onnx) → TranscriptSanitizer + RegexCanonicalizer → Gemma 4 E2B (LiteRT-LM, Backend.CPU) → on-screen Clean tab.

2 · What we use

Pixel 8 Pro · Tensor G3Mark's daily-driver phone. CPU-only because Tensor G3 has no OpenCL ICD — Backend.GPU() hangs forever inside Engine.initialize(). Hardware is what it is; we ship around it.

Gemma 4 E2B int4 (2.59 GB) via LiteRT-LM 0.11.0E4B was borderline-OOM on Pixel 8 and roughly 10× slower than real-time on CPU. E2B fits cleanly, cold-starts in under 10 sec, and was verified producing verbatim output during the 2026-05-09 test.

sherpa-onnx Moonshine-Tiny STT26 MB, 27M params, 4.52% LibriSpeech-clean WER (beats Whisper-tiny on every axis). sherpa-onnx engine is ~51× faster than whisper.cpp on Android. Quantized encoder cap is ~9 s input per chunk — fix shipped in PR #37.

TranscriptSanitizer (v18.6)Deterministic post-Moonshine pass that strips music notes, CJK noise, repeat-token loops, "I'm hungry × 6" phrase repeats, and bathroom-ghost artifacts. Runs before Gemma — Gemma never has to fight the obvious garbage.

RegexCanonicalizer (v18.9)22 hard substitution rules + tagline auto-fix. Catches "day AI" → "Bidet AI", "Pixelet/Pixelate/Pixelet" → "Pixel", etc. Gemma-independent — proper nouns come out right even if Gemma's glossary attention fails (which v18.8 confirmed it does on E4B).

inferenceMutex (kotlinx Mutex, v18.7)Serializes concurrent LiteRT-LM inference calls. Fixes the liblitertlm SIGSEGV at 0x4c9060 that used to nuke the app whenever Record + Clean raced. Verified by Mark with a 16-min stress test — zero native crash.

Fresh Conversation per transcribe call (PR #30)LiteRT-LM Conversations accumulate history. A sticky Conversation used across chunks caused chunks 1+ to come out empty — the model saw "transcript already exists" and emitted stop tokens. Each transcribe() now creates + closes a Conversation; the Engine stays warm.

Project-noun glossary (v18.8)Bidet, TP3, Legacy Soil, etc. prepended into the cleaning prompt at all three resolution sites. E4B int4 ignores it in practice, which is why the regex canonicalizer in front of Gemma is the actual fix. Glossary is the "soft attempt"; regex is the "hard enforcement".

Audio export from SessionDetailScreen (PR #39)Share Sheet button. Lets a session (RAW + cleanings + audio WAV) leave the phone via Drive / Gmail. Important for judges who want to verify a sample, and for future Whisper-mark fine-tune corpus farming.

Stable debug keystore (PR #41) + skip-if-present model gate (PR #42)CI now signs with a pinned debug key, so adb install -r preserves the 2.59 GB Gemma cache between rebuilds. No more uninstall + 2-minute USB re-push every cycle. Critical for the dozen rebuild cycles a contest week eats.

Clean for judges tab (PR #45, v20)A pinned cleaning-prompt variant tuned for the demo: short, lecture-shaped output with the project nouns intact. Reachable in two taps from the recording screen — the gold-shot button for the video.

3 · What works well (concrete numbers)

VERIFIED 75-second test = 702 char clean transcript across 3 chunks, word-perfect with punctuation (session b59e32a6, 2026-05-09 morning). This was the first end-to-end working session and is the demo path the judges will see.
VERIFIED 16-min stress test, zero native crash after the inferenceMutex fix (v18.7). The app survived what previously SIGSEGV'd on every run — Mark dictated continuously while triggering Clean races.
VERIFIED Gemma 4 E2B cold-start < 10 s on Tensor G3 CPU, real-time during transcription. E4B was 60+ s cold-start and ~10× slower than real-time. E2B is the practical pick; E4B is the spec-sheet pick that loses.
VERIFIED Offline / airplane-mode operation. The 31-min brain dump on 2026-05-09 was recovered from the phone DB after the on-device Generate hung — audio capture and Moonshine STT continued without any network reachability.
VERIFIED "Bidet AI" comes out correctly in v18.9 regex output where v18.8 glossary-only failed. This was the proof that the deterministic layer matters more than prompt engineering for proper nouns on int4 models.
VERIFIED 30-min brain dumps actually finish after PR #24 (bounded output + streaming UI + foreground gen service). Generate used to crash mid-cleanup; now it runs in a foreground service the OS won't kill mid-task.
VERIFIED 45-min hard cap with 5-min visual countdown + 10-sec audible beep (PR #26). Keeps a stuck-on recording from eating battery overnight; also matches the longest single college-lecture session a judge would test.

4 · What's left before Thursday's recording

The 8-PR stack — needs merge in order (do not parallelize)

PR	What	Status
#38	chunked cleaning for RAW > 2048-token context (v18)	base — merge first
#40	v18.8 glossary + tagline pin	retarget → main, merge
#41	stable debug.keystore	retarget → main, merge
#42	skip Gemma download when model already at expected size	retarget → main, merge
#43	v18.9 regex canonicalize project nouns	retarget → main, merge
#44	FakeGemmaEngine + InferenceMutex concurrency tests	JVM CI, merge after parents
#45	Clean for judges tab (v20) — demo gold-shot	needs CI re-trigger after #43 lands
#39	Session export (Share Sheet) — separately stacked off #38	retarget → main, merge

Mechanics: each merge uses --delete-branch=false until the stack drains, then retarget the next PR via gh api PATCH. Lesson from feedback_stacked_pr_retarget_2026-05-08.md — GitHub auto-closes a PR if its base branch is deleted first. Burned us twice already.

Known sharp edges (acceptable for video, fix post-contest)

UX "Starting recording…" subtitle sticks on screen after tap-Record instead of transitioning to the BidetTabsScreen. Root cause: BidetTabsViewModel.hasAggregator is a non-observable var. Recording itself works; the screen just lies about state for a beat. Don't linger on it in the video.
UX No drain-progress indicator when Generate races the pre-cleaner sanitizer. The mutex makes this safe (no crash), but the user sees a frozen button for a few seconds. Out of scope per feedback_phone_bidet_contest_only_2026-05-12.md — record around it.
infra No /ingest/bidet endpoint on Apex yet — cleanings still flow back through the captain's-log email hop instead of a direct push. Out of scope for contest; phone-side video doesn't show this hop anyway.
device USB debugging must be re-enabled after every phone reboot. If Mark reboots the Pixel before recording, allow 60 seconds to re-enable it. Per feedback_phone_trigger_brittle_2026-05-08.md.
device 16 KB page-alignment dialog appears on fresh install (Android 15, debug APK). Tap "Don't Show Again" once. Real fix is the NDK link flag in Phase 4B — post-contest.

Locked out of scope through 2026-05-18 (do not pursue this week)

Per Mark's 2026-05-12 ruling: phone Bidet improvements are contest-only. Web Bidet stays his daily driver post-contest.

Whisper LoRA training — phone uses Moonshine, won't help phone
Gemma 4 LoRA fine-tune
Drain-progress UX (above)
/ingest/bidet endpoint (above)
Anthropic "Dreaming" memory pattern, Gemini webhooks, RRF, Captain's-log routing — unrelated to phone

5 · Video skeleton (3-minute target)

Recording target: Thursday 2026-05-14 after school. Record 5–10 short rehearsed takes, never one live run-through (per feedback_phone_bidet_contest_only_2026-05-12.md). Side-by-side comparison sequence per Mark's 2026-05-09 brain dump.

0:00 – 0:15 · The hook + problem

Cartoon brain-into-bidet intro (8 s, generated via Gemini + Veo per Mark's spec). Tagline on screen: "Take a brain dump. Bidet AI cleans up your mess." Voice-over lands the elevator pitch from section 1.

0:15 – 0:45 · Personal story

Mark on camera or voiceover, in his own words from the 2026-05-09 brain dump: "I overthink, I can't write, I overanalyze. The brain dump has been life changing. The brain dump plus AI — that's what I've combined here." Close with: "This is something I never could have done. This is something that has just opened up my world."

0:45 – 1:30 · Phone demo, sped up, side-by-side

Pixel 8 Pro in hand. Tap Record. Live waveform + timer. Talk for ~30 s of representative brain dump (rambling on purpose). Tap Stop. Open Session. Show RAW tab — the unformatted dump. Tap Clean for judges tab — formatted output renders. Show airplane-mode icon in status bar throughout. This is the proof shot. No cuts to a desktop or cloud screen.

1:30 – 2:15 · The stack (why it can't cheat)

Brief overlay diagram: mic → Moonshine-Tiny → Sanitizer + Regex → Gemma 4 E2B → screen. Voice-over: Moonshine-Tiny does speech on the Tensor G3 CPU. Gemma 4 E2B does cleaning on the Tensor G3 CPU. There is no second device. Hit the key line: "The routing isn't a tech demo — it's enforced by the device. We can't cheat to the cloud because we don't reach for the cloud." Land it in 6 seconds.

2:15 – 3:00 · Who it helps + what's next

The application sweep from Mark's brain dump: people whose voice outruns their fingers — adult ADD thinkers, ELL adults, late-deafened, people with handwriting/typing accommodations, anyone in a noisy capture context who wants their own words back. Mention college-lecture recording for relatives in college as the original sharing intent. Close with the soft version of "It helps you be understood." Then a one-line teaser: voice fine-tune (Whisper-mark) + per-user voice-correction patterns are next. Cut to logo + tagline.

Is 3 minutes realistic given the current build? Yes — every shot above is supported by code that's either on main or in a PR that will be on main before Thursday morning. The phone demo (0:45–1:30) is the only segment that requires the live device; everything else can be pre-recorded voice + screen overlay. If a take of the demo fumbles, re-shoot the demo only.

6 · Out-of-bounds reminders for the video

College lectures, NOT K-12. The public narrative is "built for relatives in college recording lectures." Adult students, lecture-recording is widely accepted, FERPA-clean. Per feedback_bidet_phone_narrative_for_college_lectures_2026-05-07.md.

Never name nephews (Noel, Teo, Zach stay private). Say "relatives in college" or "family in college".
Never name St. Francis or sfschools.net. Mark's identity as a teacher is fine in bio if he wants; the app is for college lectures.
No claims about "deployed in classrooms" or "tested on students." The personal-narrative half can mention Mark's own teaching context as origin story; the use-case half stays on adult / professional / accessibility framing.
Don't say "Whisper-tiny" in the technical-stack section. Phone Bidet uses Moonshine-Tiny + sherpa-onnx. (Per reference_moonshine_replaces_whisper_2026-05-10.md — "you keep saying Whisper-tiny but we've gone past that.")
Don't apologize or hedge. No "this is just a prototype" framing. Phone Bidet is a working app on Mark's daily-driver phone.
Don't sell. Lead with evidence — the demo IS the argument. (Per Mark's standing rule on sales-pitch language.)

7 · The Cactus track win condition

Verified directly from Mark's logged-in Kaggle session 2026-05-09:

Cactus Special Technology Prize — "best local-first mobile or wearable application that intelligently routes tasks between models." $10K.

This is Phone Bidet literally — not metaphorically. The routing slot in the rubric is named and the architecture matches:

Local-first mobile — Tensor G3 CPU, on-device. Airplane mode is fine. No cloud call in the demo path.
Routes tasks between models — Moonshine-Tiny handles speech-to-text. Gemma 4 E2B handles cleaning + structure. A deterministic sanitizer + regex layer sits between them as a hard correction stage. Three distinct stages, three different cost/latency profiles, all chosen because they fit the device.

Prize math: Cactus is stackable with Main Track + an Impact Track. One submission can theoretically win up to three buckets (max one per bucket). Theoretical ceiling per submission: $70K (1st Main + Future of Education + Cactus).

Rubric weight (verbatim): Impact & Vision 40 · Video Pitch & Storytelling 30 · Technical Depth & Execution 30. 70% of the score is the video pitch + the vision narrative. The video is the star. The code repo is the proof.

Source: reference_kaggle_gemma4_prize_tree_2026-05-09.md