Mark's 2026-05-20 vision · on the shelf till Saturday's hardware lands · ~6-10h focused build
Tonight's tangent that earned its own pin: Bidet v2 is live-everything. The current Bidet is batch — you record, you wait, you get text. The vision is two parallel streams: a live transcript appearing word-by-word as you talk, AND an analysis layer appearing 5-10 seconds behind, rendered in a wavy blur that sharpens into clear text as the model gets more context to chew on. Press a button at the end for the deeper full-quality clean pass when you want it.
This is not retire-money. It is a real product differentiator that nobody is doing yet. Whisper Flow does live transcript. OMI does live transcript. Windows Win+H does live transcript. Nobody shows the analysis re-thinking itself in real time, with a visual that makes the cognition visible. That's the unique part.
The vision, in Mark's words
"Visualize the lagging analysis in a wavy blur as it develops and becomes clear and crisp. Hot damn, we're setting that up. If we can do it instant... Already it instantly types as it speaks, doesn't it? So the instant analysis or the as-close-to-instant analysis cleanup is just a broadened extension of instant grammar correct, autocorrect, spell-check. It fixes it as it goes. The analysis part is the same thing, just chunked differently. Broadened out to encompass a little bit more. Overlapping re-analysis. And then you can press a button for a clean analysis where it has a more thorough review. So you have an instant analysis AND a clean analysis that is generated from a button."
— Mark Barnett, 2026-05-20 ~21:00 ET
My understanding of what you're describing
The mental model is autocorrect, but at the paragraph level. Autocorrect already does this for spelling — you type "teh", a red underline appears, then it settles into "the". The brain is happy with that pattern; it doesn't feel like AI doing magic, it feels like the tool being attentive.
You want the same pattern, but where the unit of correction is the entire interpretation of what you just said. As you talk, Bidet's first guess at "what Mark means by this paragraph" appears in soft, blurred text. Five seconds later, with three more sentences of context, the model has a clearer interpretation — the blur snaps into clean prose. The reader can SEE the cognition stabilizing. That's the moment of magic.
The user gets two outputs simultaneously:
The live transcript — words appearing as they're spoken (standard live ASR, ~1-3 second latency, already a solved problem with faster-whisper streaming chunks).
The live analysis — the interpretation/cleanup/whatever-clean-tab-is-selected, rendered ~5-10s behind, blurred until confident, sharpening as context arrives.
Then a "Clean Analysis" button does the batch full-quality pass at the end — same pipeline as today's Bidet does, runs once at the end of the session, replaces the live-stream output with the polished version.
A static preview of the wavy-blur effect
Click "play" to see the visual idea (single static demo, not the real pipeline):
Bidet v2 mock
Live transcript
All right so I'm thinking about how we could ship the Charleston prep work over the next month without overwhelming William. He's gonna need...
Live analysis
Mark is planning a phased rollout of the Charleston work, pacing the prep to William's bandwidth rather than his own.
This demo cheats by just unblurring left-to-right. The real version unblurs sentence-by-sentence as the model's confidence rises, which can mean rewriting parts that already looked clear when more context comes in. That re-clarification is the part that makes it feel intelligent rather than just animated.
What it'll take to build
Streaming transcriptfaster-whisper with VAD chunking. Sliding 3-5 second window, ~1-2s latency. Already proven by Whisper Flow / Win+H. ~2 hr of backend work for the WebSocket endpoint + chunk handler.
Streaming analysisSame WebSocket. Backend runs the Clean-* prompt against the cumulative transcript every N seconds (probably every 5-10s) with a low max_tokens cap. Pushes diffs to frontend. ~2 hr.
Wavy-blur UICSS filter:blur(Npx) per word/sentence span, transitions to blur(0) when model confidence flips. ~1-2 hr of CSS + JS choreography. The mock above is a 30-line preview of the technique.
Clean Analysis buttonAlready exists — today's batch pipeline. Just exposed as a one-tap at session end. ~30 min to wire to the new UI.
Total honest estimate6-10 focused hours. A full weekend day, or two evenings. Done.
Hard prerequisite: Saturday's hardware
Two parallel streams (Whisper streaming + LLM inference) under Apex's current 12 GB WSL2 ceiling would tank both. Bidet's container is already memory-pressured today on batch-only loads — you saw the flap notifications earlier this evening. Don't ship before Saturday. Once the 64 GB SODIMM kit + SN850X NVMe land Friday and install Saturday, WSL can run at 24-32 GB and both streams have elbow room.
Open product questions for when we build
Single live-analysis stream or one-per-tab? Today's Bidet has Clean for Me / Clean for AI / Clean Classroom / Report Card. Run them all live in parallel = UI gets dense fast. Probably default = one live-analysis stream (configurable in settings to which Clean variant), with the other Clean tabs available as one-press batch buttons at the end.
Show or hide model confidence as a number? The blur IS the confidence indicator. Adding a numeric "confidence: 72%" probably oversells it. Leave it visual.
Pause/resume? If user pauses mid-thought for 30s, should the live-analysis keep re-rendering on the existing transcript or hold? Lean toward "hold" — less anxious to look at.
The name. Working set: Instabidet / Bidet AI Instant / Insta-Bidet. No decision yet. Instabidet is the punchiest.
Why this matters in the bigger product story
Bidet today is a private brain-dump tool. Lynn just got the PIN; The Boys have it; Anne Marie has it. Adoption signal is real (six recordings since this evening's share, per the live container logs). But today's Bidet is asynchronous — you talk, you wait, you read.
Instabidet is conversational. You can SEE your own scattered ideas being made sense of while you speak them. That changes what the tool is for: it's no longer "transcribe my brain dump for later cleanup," it's "help me think out loud and watch the thinking get sharp." That's a different category of tool. The brain-dump category is crowded. The "watch the analysis stabilize" category doesn't exist yet.
It's also a direct port to Mark's voice-trigger workflow (the Tasker "Computer" hotword + /ask broker). If the ask broker streams its analysis back the same wavy-blur way through ntfy/Ray-Bans, the magic transfers to ears, not just eyes.
Status
Vision captured in project_instabidet_live_transcribe_vision_2026-05-20.md + this report
No code written yet. First commit happens after hardware install verifies clean.
Open offers from Mark to me: "we might want to develop that vision a little bit" — I'm holding. When you want to refine the visual metaphor (faster sharpening? sentence-level vs word-level? color-shift to amber as confidence rises?), we iterate on this page before building.