Migration Plan v2 — Ready for Your Sign-Off

2026-04-23 late evening · plan went through independent review, now awaiting your go

TL;DR

You greenlit Option B from the Fresh Eyes Review. I wrote the full plan (v1), fired an independent reviewer + filed GitHub issue #28 for Jules. Reviewer flipped v1 to conditional NO-GO with 2 showstoppers + 4 sequencing errors + 7 naive Windows/Docker assumptions. All fixed in v2. Plan is 19 hours active execution over the weekend, then 24h clean-run + 72h parallel soak before old scheduled tasks come down.

Waiting on you: read v2 when you're rested, say "go" or "not yet." Nothing touches production until you do.

What reviewer caught and v2 fixes

Showstopper 1: v1 Phase 1 did COPY-and-DELETE of 3,500 Gemini rows, rollback required Gemini quota we don't have. → v2: COPY-only. tp3_memories stays read-only for 7+ days. DROP TABLE removed from sprint entirely.

Showstopper 2: v1 Phase 7 had DROP TABLE in same sprint with no soak. → v2: Dropped entirely. Follow-up ticket after 7+ days.

Sequencing: v1 ran risky re-embed (Phase 1) before cheap reversible wins (Phase 2). → v2 order: 0 → 2 → 1 → 3 → 4 → 5 → 6 → 7. Cheap first.

Port race: v1 Phase 3 started the container before stopping host worker → bind collision. → v2: stop host worker, kill PID, verify port 8944 empty, THEN docker compose up.

Healthcheck misconception: restart: always does NOT react to healthcheck failure — only on container exit. → v2: added willfarrell/autoheal sidecar that kills unhealthy containers so restart: always can relaunch them.

Windows/WSL2 gotcha: host.docker.internal needs extra_hosts + Ollama bound to 0.0.0.0, not the default 127.0.0.1. → v2: both enforced in Phase 0 preflight.

SQLite corruption risk: v1 bind-mounted the queue file into container — 9P/virtiofs has incomplete fcntl, concurrent access = corruption. → v2: named Docker volume, not bind mount.

Soak too short: 4h wouldn't have caught the 40h outage class. → v2: 24h clean-run + 72h parallel with old alarms still firing before anything gets deleted.

Docker Desktop auto-update: would fire mid-sprint and reset everything. → v2: disabled for the sprint in Phase 0.

What stays the same (reviewer confirmed good)

Architecture choice (Docker Compose + autoheal + one embed service + pinger) is correct.

Keeping Postgres + pgvector + MinIO + the 646K rows untouched is correct.

Ollama-only for the sprint (no Gemini calls) is correct given budget lockout.

Scheduled-task consolidation (26 → 6) is correct, just needs to happen after soak, not during.

When you decide

Go on v2

Phase 0 preflight starts when you reply. 90 min of read-only setup (backup + verify + .env parity + Ollama host fix + image pre-pulls). Nothing irreversible. If Phase 0 acceptance passes, I move to Phase 2 (kill Redis + SSE MCPs). If anything fails — Slack ping to you, no more progress until you say.

Not yet

Plan stays filed. No execution. You ask for more changes or wait for Jules comment on the GitHub issue. Current stack keeps running as-is (fragile but working).

Reading list (only if you want depth)

Full plan v2 (GitHub) — ~4,000 words, 8 phases, risk register, acceptance criteria
Reviewer memo (GitHub) — what the independent SRE flagged, phase by phase
GitHub issue #28 — where Jules will drop comments when he reviews
Fresh Eyes Review — the original three-agent report that started this

Timing

If you say go Friday morning: Phase 0–6 wrap Sat afternoon. 24h clean-run ends Sunday afternoon. 72h parallel-observer soak ends Wednesday. Old scheduled tasks come down Wednesday AM.
DC trip doesn't collide — sprint is done + soaked before May 29 pre-trip meeting.

You said "keep going" — plan is drafted, committed to repo, reviewed, revised, ready. Sleep. When you wake and say "go v2" I fire Phase 0.