Phone Bidet — contest state heading into Thursday recording
Built for the bus. Read this, then record.
1 · Elevator pitch (first 15 seconds of the video)
Phone Bidet is a 100% on-device Android brain-dump cleaner. You hit Record. Moonshine-Tiny captures speech on the Tensor G3 CPU. Gemma 4 E2B cleans it — fixes the half-formed sentences, the repeats, the proper nouns the STT mangled — without ever leaving the phone. A deterministic sanitizer + regex canonicalizer pre-corrects known mishears before Gemma sees them, so a single bad token in the audio model can't poison the output. Airplane mode is fine. There is no cloud. The model lives in your pocket.
2 · What we use
Backend.GPU() hangs forever inside Engine.initialize(). Hardware is what it is; we ship around it.0x4c9060 that used to nuke the app whenever Record + Clean raced. Verified by Mark with a 16-min stress test — zero native crash.transcribe() now creates + closes a Conversation; the Engine stays warm.adb install -r preserves the 2.59 GB Gemma cache between rebuilds. No more uninstall + 2-minute USB re-push every cycle. Critical for the dozen rebuild cycles a contest week eats.3 · What works well (concrete numbers)
- VERIFIED 75-second test = 702 char clean transcript across 3 chunks, word-perfect with punctuation (session
b59e32a6, 2026-05-09 morning). This was the first end-to-end working session and is the demo path the judges will see. - VERIFIED 16-min stress test, zero native crash after the
inferenceMutexfix (v18.7). The app survived what previously SIGSEGV'd on every run — Mark dictated continuously while triggering Clean races. - VERIFIED Gemma 4 E2B cold-start < 10 s on Tensor G3 CPU, real-time during transcription. E4B was 60+ s cold-start and ~10× slower than real-time. E2B is the practical pick; E4B is the spec-sheet pick that loses.
- VERIFIED Offline / airplane-mode operation. The 31-min brain dump on 2026-05-09 was recovered from the phone DB after the on-device Generate hung — audio capture and Moonshine STT continued without any network reachability.
- VERIFIED "Bidet AI" comes out correctly in v18.9 regex output where v18.8 glossary-only failed. This was the proof that the deterministic layer matters more than prompt engineering for proper nouns on int4 models.
- VERIFIED 30-min brain dumps actually finish after PR #24 (bounded output + streaming UI + foreground gen service). Generate used to crash mid-cleanup; now it runs in a foreground service the OS won't kill mid-task.
- VERIFIED 45-min hard cap with 5-min visual countdown + 10-sec audible beep (PR #26). Keeps a stuck-on recording from eating battery overnight; also matches the longest single college-lecture session a judge would test.
4 · What's left before Thursday's recording
The 8-PR stack — needs merge in order (do not parallelize)
| PR | What | Status |
|---|---|---|
| #38 | chunked cleaning for RAW > 2048-token context (v18) | base — merge first |
| #40 | v18.8 glossary + tagline pin | retarget → main, merge |
| #41 | stable debug.keystore | retarget → main, merge |
| #42 | skip Gemma download when model already at expected size | retarget → main, merge |
| #43 | v18.9 regex canonicalize project nouns | retarget → main, merge |
| #44 | FakeGemmaEngine + InferenceMutex concurrency tests | JVM CI, merge after parents |
| #45 | Clean for judges tab (v20) — demo gold-shot | needs CI re-trigger after #43 lands |
| #39 | Session export (Share Sheet) — separately stacked off #38 | retarget → main, merge |
Known sharp edges (acceptable for video, fix post-contest)
- UX "Starting recording…" subtitle sticks on screen after tap-Record instead of transitioning to the BidetTabsScreen. Root cause:
BidetTabsViewModel.hasAggregatoris a non-observablevar. Recording itself works; the screen just lies about state for a beat. Don't linger on it in the video. - UX No drain-progress indicator when Generate races the pre-cleaner sanitizer. The mutex makes this safe (no crash), but the user sees a frozen button for a few seconds. Out of scope per
feedback_phone_bidet_contest_only_2026-05-12.md— record around it. - infra No
/ingest/bidetendpoint on Apex yet — cleanings still flow back through the captain's-log email hop instead of a direct push. Out of scope for contest; phone-side video doesn't show this hop anyway. - device USB debugging must be re-enabled after every phone reboot. If Mark reboots the Pixel before recording, allow 60 seconds to re-enable it. Per
feedback_phone_trigger_brittle_2026-05-08.md. - device 16 KB page-alignment dialog appears on fresh install (Android 15, debug APK). Tap "Don't Show Again" once. Real fix is the NDK link flag in Phase 4B — post-contest.
Locked out of scope through 2026-05-18 (do not pursue this week)
Per Mark's 2026-05-12 ruling: phone Bidet improvements are contest-only. Web Bidet stays his daily driver post-contest.
- Whisper LoRA training — phone uses Moonshine, won't help phone
- Gemma 4 LoRA fine-tune
- Drain-progress UX (above)
/ingest/bidetendpoint (above)- Anthropic "Dreaming" memory pattern, Gemini webhooks, RRF, Captain's-log routing — unrelated to phone
5 · Video skeleton (3-minute target)
Recording target: Thursday 2026-05-14 after school. Record 5–10 short rehearsed takes, never one live run-through (per feedback_phone_bidet_contest_only_2026-05-12.md). Side-by-side comparison sequence per Mark's 2026-05-09 brain dump.
Cartoon brain-into-bidet intro (8 s, generated via Gemini + Veo per Mark's spec). Tagline on screen: "Take a brain dump. Bidet AI cleans up your mess." Voice-over lands the elevator pitch from section 1.
Mark on camera or voiceover, in his own words from the 2026-05-09 brain dump: "I overthink, I can't write, I overanalyze. The brain dump has been life changing. The brain dump plus AI — that's what I've combined here." Close with: "This is something I never could have done. This is something that has just opened up my world."
Pixel 8 Pro in hand. Tap Record. Live waveform + timer. Talk for ~30 s of representative brain dump (rambling on purpose). Tap Stop. Open Session. Show RAW tab — the unformatted dump. Tap Clean for judges tab — formatted output renders. Show airplane-mode icon in status bar throughout. This is the proof shot. No cuts to a desktop or cloud screen.
Brief overlay diagram: mic → Moonshine-Tiny → Sanitizer + Regex → Gemma 4 E2B → screen. Voice-over: Moonshine-Tiny does speech on the Tensor G3 CPU. Gemma 4 E2B does cleaning on the Tensor G3 CPU. There is no second device. Hit the key line: "The routing isn't a tech demo — it's enforced by the device. We can't cheat to the cloud because we don't reach for the cloud." Land it in 6 seconds.
The application sweep from Mark's brain dump: people whose voice outruns their fingers — adult ADD thinkers, ELL adults, late-deafened, people with handwriting/typing accommodations, anyone in a noisy capture context who wants their own words back. Mention college-lecture recording for relatives in college as the original sharing intent. Close with the soft version of "It helps you be understood." Then a one-line teaser: voice fine-tune (Whisper-mark) + per-user voice-correction patterns are next. Cut to logo + tagline.
6 · Out-of-bounds reminders for the video
College lectures, NOT K-12. The public narrative is "built for relatives in college recording lectures." Adult students, lecture-recording is widely accepted, FERPA-clean. Per feedback_bidet_phone_narrative_for_college_lectures_2026-05-07.md.
- Never name nephews (Noel, Teo, Zach stay private). Say "relatives in college" or "family in college".
- Never name St. Francis or sfschools.net. Mark's identity as a teacher is fine in bio if he wants; the app is for college lectures.
- No claims about "deployed in classrooms" or "tested on students." The personal-narrative half can mention Mark's own teaching context as origin story; the use-case half stays on adult / professional / accessibility framing.
- Don't say "Whisper-tiny" in the technical-stack section. Phone Bidet uses Moonshine-Tiny + sherpa-onnx. (Per
reference_moonshine_replaces_whisper_2026-05-10.md— "you keep saying Whisper-tiny but we've gone past that.") - Don't apologize or hedge. No "this is just a prototype" framing. Phone Bidet is a working app on Mark's daily-driver phone.
- Don't sell. Lead with evidence — the demo IS the argument. (Per Mark's standing rule on sales-pitch language.)
7 · The Cactus track win condition
Verified directly from Mark's logged-in Kaggle session 2026-05-09:
Cactus Special Technology Prize — "best local-first mobile or wearable application that intelligently routes tasks between models." $10K.
This is Phone Bidet literally — not metaphorically. The routing slot in the rubric is named and the architecture matches:
- Local-first mobile — Tensor G3 CPU, on-device. Airplane mode is fine. No cloud call in the demo path.
- Routes tasks between models — Moonshine-Tiny handles speech-to-text. Gemma 4 E2B handles cleaning + structure. A deterministic sanitizer + regex layer sits between them as a hard correction stage. Three distinct stages, three different cost/latency profiles, all chosen because they fit the device.
Prize math: Cactus is stackable with Main Track + an Impact Track. One submission can theoretically win up to three buckets (max one per bucket). Theoretical ceiling per submission: $70K (1st Main + Future of Education + Cactus).
Rubric weight (verbatim): Impact & Vision 40 · Video Pitch & Storytelling 30 · Technical Depth & Execution 30. 70% of the score is the video pitch + the vision narrative. The video is the star. The code repo is the proof.
Source: reference_kaggle_gemma4_prize_tree_2026-05-09.md