Bidet AI Desktop — Overhaul Plan

Drafted 2026-05-23 after today's 22-minute brain dump crashed during transcription. Plan only — no code changed yet. Review and redline before any touch.

(a) What's there now

Stack

Tabs today

  1. Clean Raw (builtin clean)
  2. Clean Analysis (builtin analysis)
  3. Clean for AI (builtin forai)
  4. Clean for Judges (builtin judges) + a Generate Judges Pitch button below the notebook

Icon

Today's working sessions in bidet_ai.log — 30+ prior runs, all 30s–3min audio → transcribe in 30s–2min, "Whisper done: X chars" always logs. One outlier today.

(b) The 22-min crash — root cause

Log shows exactly one line for today:

12:16:01 INFO Starting Whisper transcription of …audio_2026-05-23_12-16-01.wav

Then nothing. No Whisper done, no Pipeline failed, no exception. App went black.

The pipeline() thread in app.py:659 is wrapped in try/except Exception and would have logged anything Python-level. So model.transcribe() either:

  1. Hung in a decode loop. Even with compression_ratio_threshold=2.0 and condition_on_previous_text defaulted on, openai-whisper has a known long-audio failure mode: 30s chunks where it gets stuck repeating, retries internally, never returns. The dedup regex runs after transcribe; it cannot help if transcribe never finishes.
  2. CUDA died / OOM with no Python exception. 22 min × 16 kHz of fp16 medium-model decoding can blow the VRAM-host-memory pinned buffers. CUDA OOM in some configurations only shows up in stderr (which we swallowed with io.StringIO() at the top of app.py and transcriber.py — see lines 17–20 of both). If the Whisper C extension faulted, the GUI process would die silently.
  3. Tk freeze masquerading as crash. pipeline() runs in a daemon thread so Tk should still pump, but very long GPU work with the Python GIL released and re-acquired in chunks can stall Tk's event loop on Windows. Result: the window stops repainting, Mark sees "black," kills it.

We cannot know which without re-running with proper stderr capture. Mitigation regardless of which one it was:

(c) UI changes Mark asked for (verbatim)

  1. Window title Bidet AI — Honest AnswersBidet AI. One line in app.py:146.
  2. Wire the icon. Add self.iconbitmap(default=str(PROJECT_ROOT / "bidet_ai.ico")) in _build_ui. NEW icon asset required from Mark — current .ico is the old logo he wants replaced. Need him to drop the new PNG/SVG in repo root; I'll convert to multi-resolution .ico (16/32/48/64/128/256) for taskbar + alt-tab + title bar.
  3. Tab renames + reshuffle.
Tab 1 — NEW
Unclean Raw
currently doesn't exist
The current Clean Raw shows the cleaned output, not the raw transcript. Mark wants the raw verbatim Whisper output as Tab 1. Add a new builtin raw tab; current clean moves to Tab 2 with a different prompt.
Tab 2 — rewrite
Clean for Me
was Clean Raw
Cornell / bullets, "nice tight visual." Format defaults to cornell directive (already wired in FORMAT_DIRECTIVES). Prompt rewrite needed: current CLEAN_PROMPT returns prose; Mark wants tight grouped bullets.
Tab 3 — rename only
Clean for AI
was Clean for AI
Unchanged from current forai behavior. Label rename only.
Tab 4 — rewrite
Clean Summary
was Clean Analysis
Currently analysis returns a structured-headings report (summary / topics / actions / decisions / questions / follow-ups). Mark wants a prose summary "anyone could read." Trim headings, output 3–6 paragraphs.
  1. REMOVE Clean for Judges tab + Generate Judges Pitch button. Contest is shipped, no longer needed. Per the sister rule from today (feedback_never_delete_work_archive_2026-05-23), archive — don't delete:
    • Move JUDGES_PROMPT, format_for_judges, _run_judges, _fill_judges into archive/judges_mode.py so the strings survive
    • Strip the builtin judges tab from the default tab list + remove the button from _build_ui
    • ~/.bidet/prompts.json migration: drop the judges entry on first run after upgrade so Mark's existing state doesn't re-add it
  2. Migration of existing user state. ~/.bidet/prompts.json on Mark's machine has the old four-tab structure. On startup, detect old schema and rewrite to new four-tab structure preserving any overrides Mark made on clean / analysis / forai. Back up the old file as prompts.json.bak.20260523 first.

(d) "Premiere level" — what's worth adding, what's gold-plating

Worth doing now   low risk, big lift

UpgradeWhyCost
faster-whisper backend with segment streaming + VAD Fixes today's crash class, faster, partial output survives crash ~1hr swap. Side-by-side test against openai-whisper on Mark's voice sample required before cutover (HANDOFF requirement)
Live transcription progress Currently zero feedback during a 5+ min transcribe. Show segment count + last 80 chars of newest segment in the status bar 30 min, comes free with faster-whisper iterator
Real stderr/stdout to log file The io.StringIO() swallow is the reason today's crash is a mystery 10 min
Wire the icon Visible win every time Mark sees the taskbar 5 min once new icon arrives
Waveform meter on record button Visual confirmation mic is hot; today's silence detection is invisible until a beep 1hr, sounddevice already provides RMS
Crash-resilient pipeline Write raw_*.txt to disk as transcribe streams, not after it completes. Drive/TP3 ingest the partial if app dies. 1hr
Whisper model bump to large-v3 faster-whisper Mark's recovery today used it; tiny WER improvement on his voice. RTX 4070 12GB handles it in fp16 with ~5 GB headroom Test required first

Worth considering   ask Mark

Not worth it

(e) Recommended sequence

Phase A — Stop the bleeding  ·  one sitting, ~2hr
  1. Restore real stderr to log file (10 min — unblocks debugging everything else)
  2. Swap to faster-whisper with VAD + segment streaming, write transcript to disk per-segment (1hr)
  3. Side-by-side WER check vs openai-whisper on Mark's 22-min audio (15 min)
  4. Add transcribe watchdog + progress in status bar (30 min)
Phase B — UI cleanup Mark asked for  ·  one sitting, ~1.5hr
  1. Window title fix (1 min)
  2. Wire icon (5 min, blocked on new icon asset from Mark)
  3. Tab rename + reshuffle: add Unclean Raw, rework clean/analysis/forai prompts for Clean for Me / Clean Summary / Clean for AI (1hr)
  4. Archive judges code/button (15 min)
  5. ~/.bidet/prompts.json migration with .bak (15 min)
Phase C — Premiere polish  ·  optional, ~3hr
  1. Waveform meter on record button (1hr)
  2. Crash-resilient distribute (write partial transcript on crash) (1hr)
  3. Hotkey to start/stop (1hr)
  4. Consider CustomTkinter swap (test branch only, do not merge without Mark's sign-off)
Do not   per task brief

New deps proposed (Phase A): faster-whisper>=1.2.1 (already in system Python; add to venv + requirements.txt). No others.

Open question for Mark: New icon asset? Source file (PNG/SVG) and target style. I'll convert to multi-res .ico.