Phone Bidet — contest state heading into Thursday recording

Built for the bus. Read this, then record.

2026-05-12 ET · Kaggle Gemma 4 Good Hackathon · deadline 2026-05-18 23:59 UTC · video target 2026-05-14 (Thursday after school)

1 · Elevator pitch (first 15 seconds of the video)

Phone Bidet is a 100% on-device Android brain-dump cleaner. You hit Record. Moonshine-Tiny captures speech on the Tensor G3 CPU. Gemma 4 E2B cleans it — fixes the half-formed sentences, the repeats, the proper nouns the STT mangled — without ever leaving the phone. A deterministic sanitizer + regex canonicalizer pre-corrects known mishears before Gemma sees them, so a single bad token in the audio model can't poison the output. Airplane mode is fine. There is no cloud. The model lives in your pocket.

Routing: mic → Moonshine-Tiny (sherpa-onnx)TranscriptSanitizer + RegexCanonicalizerGemma 4 E2B (LiteRT-LM, Backend.CPU) → on-screen Clean tab.

2 · What we use

Pixel 8 Pro · Tensor G3Mark's daily-driver phone. CPU-only because Tensor G3 has no OpenCL ICD — Backend.GPU() hangs forever inside Engine.initialize(). Hardware is what it is; we ship around it.
Gemma 4 E2B int4 (2.59 GB) via LiteRT-LM 0.11.0E4B was borderline-OOM on Pixel 8 and roughly 10× slower than real-time on CPU. E2B fits cleanly, cold-starts in under 10 sec, and was verified producing verbatim output during the 2026-05-09 test.
sherpa-onnx Moonshine-Tiny STT26 MB, 27M params, 4.52% LibriSpeech-clean WER (beats Whisper-tiny on every axis). sherpa-onnx engine is ~51× faster than whisper.cpp on Android. Quantized encoder cap is ~9 s input per chunk — fix shipped in PR #37.
TranscriptSanitizer (v18.6)Deterministic post-Moonshine pass that strips music notes, CJK noise, repeat-token loops, "I'm hungry × 6" phrase repeats, and bathroom-ghost artifacts. Runs before Gemma — Gemma never has to fight the obvious garbage.
RegexCanonicalizer (v18.9)22 hard substitution rules + tagline auto-fix. Catches "day AI" → "Bidet AI", "Pixelet/Pixelate/Pixelet" → "Pixel", etc. Gemma-independent — proper nouns come out right even if Gemma's glossary attention fails (which v18.8 confirmed it does on E4B).
inferenceMutex (kotlinx Mutex, v18.7)Serializes concurrent LiteRT-LM inference calls. Fixes the liblitertlm SIGSEGV at 0x4c9060 that used to nuke the app whenever Record + Clean raced. Verified by Mark with a 16-min stress test — zero native crash.
Fresh Conversation per transcribe call (PR #30)LiteRT-LM Conversations accumulate history. A sticky Conversation used across chunks caused chunks 1+ to come out empty — the model saw "transcript already exists" and emitted stop tokens. Each transcribe() now creates + closes a Conversation; the Engine stays warm.
Project-noun glossary (v18.8)Bidet, TP3, Legacy Soil, etc. prepended into the cleaning prompt at all three resolution sites. E4B int4 ignores it in practice, which is why the regex canonicalizer in front of Gemma is the actual fix. Glossary is the "soft attempt"; regex is the "hard enforcement".
Audio export from SessionDetailScreen (PR #39)Share Sheet button. Lets a session (RAW + cleanings + audio WAV) leave the phone via Drive / Gmail. Important for judges who want to verify a sample, and for future Whisper-mark fine-tune corpus farming.
Stable debug keystore (PR #41) + skip-if-present model gate (PR #42)CI now signs with a pinned debug key, so adb install -r preserves the 2.59 GB Gemma cache between rebuilds. No more uninstall + 2-minute USB re-push every cycle. Critical for the dozen rebuild cycles a contest week eats.
Clean for judges tab (PR #45, v20)A pinned cleaning-prompt variant tuned for the demo: short, lecture-shaped output with the project nouns intact. Reachable in two taps from the recording screen — the gold-shot button for the video.

3 · What works well (concrete numbers)

4 · What's left before Thursday's recording

The 8-PR stack — needs merge in order (do not parallelize)

PRWhatStatus
#38chunked cleaning for RAW > 2048-token context (v18)base — merge first
#40v18.8 glossary + tagline pinretarget → main, merge
#41stable debug.keystoreretarget → main, merge
#42skip Gemma download when model already at expected sizeretarget → main, merge
#43v18.9 regex canonicalize project nounsretarget → main, merge
#44FakeGemmaEngine + InferenceMutex concurrency testsJVM CI, merge after parents
#45Clean for judges tab (v20) — demo gold-shotneeds CI re-trigger after #43 lands
#39Session export (Share Sheet) — separately stacked off #38retarget → main, merge

Mechanics: each merge uses --delete-branch=false until the stack drains, then retarget the next PR via gh api PATCH. Lesson from feedback_stacked_pr_retarget_2026-05-08.md — GitHub auto-closes a PR if its base branch is deleted first. Burned us twice already.

Known sharp edges (acceptable for video, fix post-contest)

Locked out of scope through 2026-05-18 (do not pursue this week)

Per Mark's 2026-05-12 ruling: phone Bidet improvements are contest-only. Web Bidet stays his daily driver post-contest.

5 · Video skeleton (3-minute target)

Recording target: Thursday 2026-05-14 after school. Record 5–10 short rehearsed takes, never one live run-through (per feedback_phone_bidet_contest_only_2026-05-12.md). Side-by-side comparison sequence per Mark's 2026-05-09 brain dump.

0:00 – 0:15 · The hook + problem

Cartoon brain-into-bidet intro (8 s, generated via Gemini + Veo per Mark's spec). Tagline on screen: "Take a brain dump. Bidet AI cleans up your mess." Voice-over lands the elevator pitch from section 1.

0:15 – 0:45 · Personal story

Mark on camera or voiceover, in his own words from the 2026-05-09 brain dump: "I overthink, I can't write, I overanalyze. The brain dump has been life changing. The brain dump plus AI — that's what I've combined here." Close with: "This is something I never could have done. This is something that has just opened up my world."

0:45 – 1:30 · Phone demo, sped up, side-by-side

Pixel 8 Pro in hand. Tap Record. Live waveform + timer. Talk for ~30 s of representative brain dump (rambling on purpose). Tap Stop. Open Session. Show RAW tab — the unformatted dump. Tap Clean for judges tab — formatted output renders. Show airplane-mode icon in status bar throughout. This is the proof shot. No cuts to a desktop or cloud screen.

1:30 – 2:15 · The stack (why it can't cheat)

Brief overlay diagram: mic → Moonshine-Tiny → Sanitizer + Regex → Gemma 4 E2B → screen. Voice-over: Moonshine-Tiny does speech on the Tensor G3 CPU. Gemma 4 E2B does cleaning on the Tensor G3 CPU. There is no second device. Hit the key line: "The routing isn't a tech demo — it's enforced by the device. We can't cheat to the cloud because we don't reach for the cloud." Land it in 6 seconds.

2:15 – 3:00 · Who it helps + what's next

The application sweep from Mark's brain dump: people whose voice outruns their fingers — adult ADD thinkers, ELL adults, late-deafened, people with handwriting/typing accommodations, anyone in a noisy capture context who wants their own words back. Mention college-lecture recording for relatives in college as the original sharing intent. Close with the soft version of "It helps you be understood." Then a one-line teaser: voice fine-tune (Whisper-mark) + per-user voice-correction patterns are next. Cut to logo + tagline.

Is 3 minutes realistic given the current build? Yes — every shot above is supported by code that's either on main or in a PR that will be on main before Thursday morning. The phone demo (0:45–1:30) is the only segment that requires the live device; everything else can be pre-recorded voice + screen overlay. If a take of the demo fumbles, re-shoot the demo only.

6 · Out-of-bounds reminders for the video

College lectures, NOT K-12. The public narrative is "built for relatives in college recording lectures." Adult students, lecture-recording is widely accepted, FERPA-clean. Per feedback_bidet_phone_narrative_for_college_lectures_2026-05-07.md.

7 · The Cactus track win condition

Verified directly from Mark's logged-in Kaggle session 2026-05-09:

Cactus Special Technology Prize — "best local-first mobile or wearable application that intelligently routes tasks between models." $10K.

This is Phone Bidet literally — not metaphorically. The routing slot in the rubric is named and the architecture matches:

Prize math: Cactus is stackable with Main Track + an Impact Track. One submission can theoretically win up to three buckets (max one per bucket). Theoretical ceiling per submission: $70K (1st Main + Future of Education + Cactus).

Rubric weight (verbatim): Impact & Vision 40 · Video Pitch & Storytelling 30 · Technical Depth & Execution 30. 70% of the score is the video pitch + the vision narrative. The video is the star. The code repo is the proof.

Source: reference_kaggle_gemma4_prize_tree_2026-05-09.md