Bidet AI — the Kaggle submission, as the judges see it

Gemma 4 Good Hackathon · 2026-05-17 · what's submitted + how it works + an independent 4-judge review

SUBMITTED & verified live. Kaggle shows "Submitted!" — editable/re-submittable freely until Mon 2026-05-18 7:59 PM EDT. Final video youtu.be/EAJe4rpJAF0 (v5.8, 2:43, Unlisted, plays with no login) and the clean 717-word writeup are both live. The old floor video was deleted; only the final video remains on the channel.

1 · Exactly what a judge opens

A judge clicks one Kaggle writeup page and sees, in order: a title + subtitle, a 2:43 video, a written story, and links to the code and a live demo. That's the whole surface. Here it is verbatim.

Title

Bidet AI — on-device Gemma 4 turns a messy brain-dump into clean writing

Subtitle

A teacher who isn't a coder built it so the most personal writing he does never has to leave his phone.

Tracks selected (stackable — one prize per bucket)

Impact → Digital Equity & Inclusivity · Main Track · Special Technology Track

Video (the "most important part" — ~70% of score is video + vision)

https://youtu.be/EAJe4rpJAF0 — 2:43, Unlisted on the Bidet AI / @bidetai channel, not-for-kids. First-person story in Mark's own voice: the 2 AM report-card-comment wall → he builds the tool → a real on-device demo with airplane mode visibly ON → time saved → personal fine-tune → "I'm not rare" → soft close.

Public code repo

github.com/MrB-Ed/bidet-ai — Apache 2.0, no login wall. Forks Google's AI Edge Gallery; the original engineering (the capture-and-restructure pipeline, the shared on-device LiteRT-LM engine, the Moonshine transcription path) sits on top.

Live demo + cover

bidetai.app · cover image = the Bidet AI brand card.

2 · The written story (the 717-word body, verbatim)

The problem is mine

My name is Mark. I'm a teacher, and I'm not a coder. Every few months there's a piece of writing that wrecks me: honest comments about real students — the most personal, highest-stakes writing I do. It always came out the same way — two in the morning, blank page, everything in my head refusing to line up. My brain runs a mile a minute and goes everywhere, faster than I can type, faster than I can talk. I have ADD. Getting what's actually in my head onto the page has always been the hard part.

I don't think I'm rare. Anyone whose thinking outruns their hands knows this gap. For a lot of people the keyboard is the bottleneck, not the thinking.

Demo video: https://youtu.be/EAJe4rpJAF0

What it does

You hit record and you talk — ramble, stutter, repeat, go off on tangents. It transcribes you as you go, then reshapes the mess into clear writing. It doesn't summarize me. It organizes what I actually said and provides the context other people need, so it sounds like me on a good day, finally saying it the way I intended. There's a version cleaned for me, and a version cleaned for other people to read.

From my own use: a six-hour, up-till-2 a.m. spiral became about two and a half hours of tweaking and proofing — and for the first time it was actually enjoyable. The output was the most genuine writing I've ever produced. It was mine. It was me.

Why it has to be on-device

Bidet AI runs 100% on the phone, with no internet, on a three-year-old phone — and that's what lets it reach the people the cloud leaves behind.

That matters most for exactly the writing that drove me to build it. The comments I write are about real students — specific, candid, sometimes hard. There is no version of me that uploads that to someone's server to get cleaned up. With Bidet AI nothing is sent anywhere on its own; the only way anything leaves the phone is if I choose to share it. Private here isn't a policy I'm trusting — it's where the computer is.

And the floor is a phone someone already owns, not a subscription and a credit card on file. A voice-to-clear-writing tool that lives in the cloud serves people who can afford the cloud. One that runs on an old phone serves everyone else — and the people who most need to get a tangled thought out are often the ones for whom writing, not thinking, is the friction.

How it works

The repository forks Google's AI Edge Gallery (Apache 2.0); the engineering is the capture-and-restructure pipeline on top, and every part runs on the device:

Continuous capture — a foreground service records 16 kHz audio in overlapping windows so a brain-dump can run as long as it needs to.
On-device transcription — a bundled ~27M-parameter Moonshine-Tiny v2 model via the sherpa-onnx runtime, stitched with fuzzy de-duplication. It replaced an earlier whisper.cpp prototype — smaller, faster, more accurate.
On-device language — cleanup, analysis, and AI-ready output by Gemma 4 E2B via LiteRT-LM, on the phone, no cloud, no fallback. The small model listens; Gemma does the language. E2B is small enough to run well on a three-year-old phone.
Privacy by construction — a first-run consent screen enforces the Gemma Terms of Use; the only network the app ever touches is a one-time, optional model download. No telemetry, no phone-home.

Personalization. A small LoRA fine-tune, built with Unsloth on ~1,300 paired examples from my own brain-dumps, trained and validated — on held-out samples it strips my disfluency and keeps my voice. The shipping build defaults to the base model; the personal adapter rides on top.

The full source is public at github.com/MrB-Ed/bidet-ai — the fork lineage, the pipeline, the on-device wiring, and the consent flow are all readable and buildable.

Close

This is a tool I needed, that I could build, on a phone I already had. It helps me communicate. The words are always in my head; on-device Gemma 4 is what finally let them out — and it can do the same for anyone whose brain moves faster than their hands.

Take a brain dump. Bidet AI cleans up your mess.

3 · How it actually works (the pipeline)

One phone, no internet, nothing leaves the device unless Mark chooses to share it:

🎙️ Speak (foreground capture, 16 kHz, overlapping windows — talk as long as you want)
    ↓
📝 Moonshine-Tiny v2 (~27M params) via sherpa-onnx — transcribes on-device, fuzzy-dedup stitches chunks
    ↓
🧠 Gemma 4 E2B via LiteRT-LM (NPU→CPU fallback) — cleans, organizes, adds context, on-device, no cloud
    ↓
📄 Two outputs: a version cleaned for you, and a version cleaned for others to read
    · optional: a personal Unsloth LoRA adapter rides on top of the base model

The small speech model listens; Gemma does the language — two engines, each doing the job it's good at, both on a three-year-old phone in airplane mode.

4 · The official rubric (what they grade)

Criterion	Points	Notes
Impact & Vision	40	Heaviest single criterion. Real-world impact + vision/scale.
Video Pitch & Storytelling	30	Organizers: "the most important part of your submission."
Technical Depth & Execution	30	Must clearly show Gemma 4 implementation.

~70% of the score is the video + the vision narrative. Stacking: Main + Impact + Special are three separate buckets, one prize per bucket.

5 · Independent 4-judge review (run 2026-05-17, blind to each other)

Four independent judges, each grounded in the rubric above, scored separately and were told to find what a skeptical judge docks — not to cheerlead. They converged.

Lens	Score	One-line read
Impact & Vision (/40)	27–31	Equity thesis is structural & real; capped by n=1 / asserted-not-demonstrated impact.
Video Pitch (/30)	24–25	Authentic first-person voice + real airplane-mode demo beats the wrapper-demo field; back third stacks 3 ideas over the climax.
Technical Depth (/30)	21–23	Real on-device Gemma/LiteRT stack; docked for the LoRA contradiction, zero benchmarks, forked-app optics.
Predicted total	~73–79 / 100	Red-team midpoint 78.

Realistic outcome per track

Digital Equity & Inclusivity ($10K) — dark horse, lean favorable. "Honest comments about real students can't go to a cloud" is a sharper equity story than the median offline entry's generic "no internet needed." This is the real shot.
LiteRT ($10K) — dark horse. The public code genuinely proves on-device Gemma 4 via LiteRT-LM (real engine provider, NPU→CPU fallback, no network in the inference path).
Unsloth ($10K) — unlikely. Don't center it (see the #1 issue below).
Main Track cash — unlikely; honorable-mention tier at best against thousands of entries.

#1 issue — all four judges flagged it independently, and it's a ~20-minute writeup-only fix. The writeup says the Unsloth fine-tune was "trained and validated." The public repo's own README explicitly calls it "experimental, in-progress… not a finished result… no fine-tuned model is claimed as working or included." A judge who clicks the repo (the writeup invites them to) sees a direct contradiction — and then distrusts every other number. This is the single biggest score-killer and it costs nothing to fix.

Consensus findings

Impact is n=1 — only Mark's own anecdote. Reframe as an honest founder-problem; add one concrete data point if a true one exists.
Cactus claim is a stretch — a fixed ASR→LLM pipeline is not "intelligent routing between models." Drop/soften it. LiteRT stays; it's real.
"Forked AI Edge Gallery" optics — add one sentence naming what is genuinely original (the Moonshine path, chunking/dedup, the single shared-Engine memory architecture for a 3-yr-old phone) so a skim-reading judge can't file it as "just a wrapper."
No benchmarks — add 3–5 rough, device-labeled numbers (Pixel time-to-first-token, tokens/sec, peak RAM, asset sizes).
Do NOT re-cut the video. Unanimous. The 2:43 cut with the real airplane-mode demo is the strongest single asset; a re-cut ~24 hrs out risks a broken upload for marginal gain. Trivial fixes only: thumbnail, YouTube description as a judge cheat-sheet, writeup.
The toilet/bidet metaphor — accepted variance (memorable vs. unserious depends on the judge). Don't touch it this late; just keep substance hitting first in the opening seconds and the first paragraph.

Recommended before Monday 7:59 PM EDT (all writeup-only, cheap, low-risk)

Reconcile the LoRA claim with the repo. Highest priority. Reframe "trained and validated" as an explicit in-progress experiment that isn't in the shipped build. Removes the #1 dismissal for free.
Reframe the impact line from proof to honest founder-problem; add one real concrete data point if available.
One originality sentence — what's yours vs. the fork.
Soften / drop the Cactus angle; keep LiteRT.
Add 3–5 rough benchmark numbers (writeup; optional one-screen BENCHMARKS.md in the repo).
Freeze the video. Optionally improve only the YouTube thumbnail + description (3-line judge cheat-sheet).

Submission stays freely editable until the deadline. None of the above is required to remain submitted — it's the prioritized list of what would move the score, ranked by ROI and risk. Say the word and I make the writeup edits over the live submission.