Bidet AI Desktop — Overhaul Plan

Drafted 2026-05-23 after today's 22-minute brain dump crashed during transcription. Plan only — no code changed yet. Review and redline before any touch.

(a) What's there now

Stack

Tkinter UI (app.py, 888 lines), always-on-top dark window, title Bidet AI — Honest Answers
recorder.py — sounddevice 16 kHz mono, RMS silence detection, beep ladder at 30/45/60s (NOT the 10/15/20 in HANDOFF — already drifted)
transcriber.py — openai-whisper library, model medium, GPU fp16 on RTX 4070, one big blocking model.transcribe() call, regex de-loop pass after
processor.py — Ollama local-first (gemma3:4b) with Gemini cloud fallback. Glossary + speaker-context prepended to every prompt
distributor.py — webhook POST + Google Drive upload (Drive currently skipped, no DRIVE_FOLDER_ID in .env)
tp3_ingest.py — MinIO audio archive + Postgres ingest with Gemini embedding (works when Ollama is up)
Prompts library at ~/.bidet/prompts.json — user-overridable per-tab, plus custom tab support

Tabs today

Clean Raw (builtin clean)
Clean Analysis (builtin analysis)
Clean for AI (builtin forai)
Clean for Judges (builtin judges) + a Generate Judges Pitch button below the notebook

Icon

bidet_ai.ico exists in repo root (256×256 RGBA, looks old per Mark)
NEVER WIRED INTO THE APP. app.py has no iconbitmap / iconphoto call. So the running window shows Tk's default feather. The .ico is just sitting there.

Today's working sessions in bidet_ai.log — 30+ prior runs, all 30s–3min audio → transcribe in 30s–2min, "Whisper done: X chars" always logs. One outlier today.

(b) The 22-min crash — root cause

Log shows exactly one line for today:

12:16:01 INFO Starting Whisper transcription of …audio_2026-05-23_12-16-01.wav

Then nothing. No Whisper done, no Pipeline failed, no exception. App went black.

The pipeline() thread in app.py:659 is wrapped in try/except Exception and would have logged anything Python-level. So model.transcribe() either:

Hung in a decode loop. Even with compression_ratio_threshold=2.0 and condition_on_previous_text defaulted on, openai-whisper has a known long-audio failure mode: 30s chunks where it gets stuck repeating, retries internally, never returns. The dedup regex runs after transcribe; it cannot help if transcribe never finishes.
CUDA died / OOM with no Python exception. 22 min × 16 kHz of fp16 medium-model decoding can blow the VRAM-host-memory pinned buffers. CUDA OOM in some configurations only shows up in stderr (which we swallowed with io.StringIO() at the top of app.py and transcriber.py — see lines 17–20 of both). If the Whisper C extension faulted, the GUI process would die silently.
Tk freeze masquerading as crash. pipeline() runs in a daemon thread so Tk should still pump, but very long GPU work with the Python GIL released and re-acquired in chunks can stall Tk's event loop on Windows. Result: the window stops repainting, Mark sees "black," kills it.

We cannot know which without re-running with proper stderr capture. Mitigation regardless of which one it was:

Switch transcribe path to faster-whisper (CTranslate2 backend). Already installed in system Python (Mark used it to recover the 22-min audio today — succeeded on CPU int8 in reasonable time, would be much faster on GPU). It:
- Streams segments back as an iterator → we can write to disk and update the UI progressively, so a crash at minute 18 still saves minutes 0–18
- Uses ~30–40% less VRAM at the same model size
- Has battle-tested VAD filtering (vad_filter=True) that solves the loop-on-silence problem at the source instead of regex'ing it later
Restore stderr capture to a log file instead of StringIO(). The current if sys.stderr is None: sys.stderr = io.StringIO() block silently eats every CUDA / cuBLAS / ctranslate2 message. Route to a rotating file handler so the next mystery crash has a tail.
Add a transcribe-progress watchdog. If no segment lands for 90s, surface "transcribe stalled" in the status bar and let Mark either wait or kill cleanly with the partial transcript already saved.

(c) UI changes Mark asked for (verbatim)

Window title Bidet AI — Honest Answers → Bidet AI. One line in app.py:146.
Wire the icon. Add self.iconbitmap(default=str(PROJECT_ROOT / "bidet_ai.ico")) in _build_ui. NEW icon asset required from Mark — current .ico is the old logo he wants replaced. Need him to drop the new PNG/SVG in repo root; I'll convert to multi-resolution .ico (16/32/48/64/128/256) for taskbar + alt-tab + title bar.
Tab renames + reshuffle.

Tab 1 — NEW

Unclean Raw

currently doesn't exist

The current Clean Raw shows the cleaned output, not the raw transcript. Mark wants the raw verbatim Whisper output as Tab 1. Add a new builtin raw tab; current clean moves to Tab 2 with a different prompt.

Tab 2 — rewrite

Clean for Me

was Clean Raw

Cornell / bullets, "nice tight visual." Format defaults to cornell directive (already wired in FORMAT_DIRECTIVES). Prompt rewrite needed: current CLEAN_PROMPT returns prose; Mark wants tight grouped bullets.

Tab 3 — rename only

Clean for AI

was Clean for AI

Unchanged from current forai behavior. Label rename only.

Tab 4 — rewrite

Clean Summary

was Clean Analysis

Currently analysis returns a structured-headings report (summary / topics / actions / decisions / questions / follow-ups). Mark wants a prose summary "anyone could read." Trim headings, output 3–6 paragraphs.

REMOVE Clean for Judges tab + Generate Judges Pitch button. Contest is shipped, no longer needed. Per the sister rule from today (feedback_never_delete_work_archive_2026-05-23), archive — don't delete:
- Move JUDGES_PROMPT, format_for_judges, _run_judges, _fill_judges into archive/judges_mode.py so the strings survive
- Strip the builtin judges tab from the default tab list + remove the button from _build_ui
- ~/.bidet/prompts.json migration: drop the judges entry on first run after upgrade so Mark's existing state doesn't re-add it
Migration of existing user state. ~/.bidet/prompts.json on Mark's machine has the old four-tab structure. On startup, detect old schema and rewrite to new four-tab structure preserving any overrides Mark made on clean / analysis / forai. Back up the old file as prompts.json.bak.20260523 first.

(d) "Premiere level" — what's worth adding, what's gold-plating

Worth doing now low risk, big lift

Upgrade	Why	Cost
faster-whisper backend with segment streaming + VAD	Fixes today's crash class, faster, partial output survives crash	~1hr swap. Side-by-side test against `openai-whisper` on Mark's voice sample required before cutover (HANDOFF requirement)
Live transcription progress	Currently zero feedback during a 5+ min transcribe. Show segment count + last 80 chars of newest segment in the status bar	30 min, comes free with faster-whisper iterator
Real stderr/stdout to log file	The `io.StringIO()` swallow is the reason today's crash is a mystery	10 min
Wire the icon	Visible win every time Mark sees the taskbar	5 min once new icon arrives
Waveform meter on record button	Visual confirmation mic is hot; today's silence detection is invisible until a beep	1hr, sounddevice already provides RMS
Crash-resilient pipeline	Write `raw_.txt` to disk as transcribe streams*, not after it completes. Drive/TP3 ingest the partial if app dies.	1hr
Whisper model bump to `large-v3` faster-whisper	Mark's recovery today used it; tiny WER improvement on his voice. RTX 4070 12GB handles it in fp16 with ~5 GB headroom	Test required first

Worth considering ask Mark

CustomTkinter swap. Modern dark theme out of the box, better widget styling, drop-in replacement for tk.Tk in 90% of cases. Risk: minor layout drift, +1 dep. Payoff: looks more 2026.
PyWebView / Tauri rewrite. Browser-based UI rendered in a desktop chrome. Bigger lift, but unlocks the same look as Bidet AI Web. Probably premature — Mark's pain points are in the pipeline, not the Tk look-and-feel.
System tray companion so the window can hide-to-tray instead of being always-on-top. Mark's old gripe was the topmost window blocking other work. Optional.
Hotkey to start/stop recording the same way Bidet Quick uses Ctrl+Shift+; — would let Mark brain-dump without alt-tabbing.

Not worth it

Rewriting in Electron / Tauri / Qt. The Tk stack works, the bugs are in the audio path and the UI surface.
Adding speaker diarization. Solo voice, no need.
Cloud Whisper. Mark wants local; he has the GPU; it costs nothing to keep local.

(e) Recommended sequence

Phase A — Stop the bleeding · one sitting, ~2hr

Restore real stderr to log file (10 min — unblocks debugging everything else)
Swap to faster-whisper with VAD + segment streaming, write transcript to disk per-segment (1hr)
Side-by-side WER check vs openai-whisper on Mark's 22-min audio (15 min)
Add transcribe watchdog + progress in status bar (30 min)

Phase B — UI cleanup Mark asked for · one sitting, ~1.5hr

Window title fix (1 min)
Wire icon (5 min, blocked on new icon asset from Mark)
Tab rename + reshuffle: add Unclean Raw, rework clean/analysis/forai prompts for Clean for Me / Clean Summary / Clean for AI (1hr)
Archive judges code/button (15 min)
~/.bidet/prompts.json migration with .bak (15 min)

Phase C — Premiere polish · optional, ~3hr

Waveform meter on record button (1hr)
Crash-resilient distribute (write partial transcript on crash) (1hr)
Hotkey to start/stop (1hr)
Consider CustomTkinter swap (test branch only, do not merge without Mark's sign-off)

Do not per task brief

Push to GitHub
Delete any existing code/files (archive only)
Break Drive/MinIO/TP3 ingest
Rip out openai-whisper without WER parity test on Mark-voice
Add deps without listing here first

New deps proposed (Phase A): faster-whisper>=1.2.1 (already in system Python; add to venv + requirements.txt). No others.

Open question for Mark: New icon asset? Source file (PNG/SVG) and target style. I'll convert to multi-res .ico.