ChatGPT + Codex primer — what to do while Claude is rate-limited

2026-05-21 — your read on switching off Claude for a stretch, plus a working list of things to attack with ChatGPT in the meantime

You asked whether ChatGPT (and OpenAI's Codex agent platform) is "a little bit better at the moment," whether the two of us can work in tandem, and what to give the other side to chew on while you're waiting for Claude credits to refresh. Honest take: it depends on the task. There are things ChatGPT/Codex is genuinely sharper at right now, and things that would be a hassle to migrate. Below is the comparison, then a concrete work list sized for ChatGPT to chew while Claude is dark.

The 30-second read

Don't ditch Claude. Run them in tandem. Use ChatGPT/Codex when Claude is rate-shut OR when the task is squarely in their strength zone (heavy multi-file Python/TS refactors, fresh-from-scratch scaffolds, math-heavy reasoning, deep research with their newer search agents). Keep Claude as the operating spine — it's the agent that knows your memory repo, your prime directives, the Pixel/AutoVoice/Bidet stack intimately, and your communication preferences without re-priming.

The two coexist cleanly because your memory lives in a git repo (MrB-Ed/claude-memory) that any agent can clone. ChatGPT can read it but won't auto-extend it without a different discipline.

Current state of ChatGPT + Codex (late 2025 / early 2026)

The OpenAI side has shipped a lot recently. Here's what's load-bearing for your stack:

Capability	What it is	Bottom line for your stack
GPT-5	OpenAI's flagship; longer context (~256-400K depending on tier), stronger coding/math, more reliable tool use.	Comparable to Claude Opus / Sonnet on most tasks; arguably edges Claude on raw math + algorithmic reasoning.
Codex (the new agentic Codex, not the 2021 one)	OpenAI's coding-agent product. Has its own CLI (`codex`), a cloud version, and a GitHub-integrated mode. Multi-file refactors, can spawn parallel subagents.	Direct competitor to Claude Code. Strong on multi-file Python/JS work. Cloud mode runs on OpenAI infra (different billing pool from your Claude credits).
ChatGPT memory	Long-term memory feature in the consumer ChatGPT app. Remembers facts across sessions.	Useful for personal context but NOT a substitute for your file-based memory repo. Less precise, less greppable, can't be reviewed/edited like markdown.
ChatGPT Agent mode / Operator	Browser-driving + computer-use agent. Headless or via attached Chrome.	Roughly the chrome-devtools MCP pattern you already have. Same headless-Chrome caveat.
MCP support	OpenAI has shipped MCP server support in ChatGPT Pro/Team for direct tool integration.	Your existing MCP servers (chrome-devtools, twin-memory, Gmail, etc.) may work with ChatGPT with config tweaks. Compatibility is real but not 1:1 with Claude Code.
Deep research mode	Multi-step search + synthesis. ChatGPT runs ~20-50 queries and writes a long report.	Genuinely good for your "AI Radar / what changed this week" kind of work. Cheap relative to running a Claude subagent for the same.
Pricing tiers	Plus $20/mo (light), Pro $200/mo (heavy, includes Codex cloud + Operator), Team/Enterprise above.	Pro tier is the right shelf for running tandem with Claude Max. Plus alone won't keep up with your usage.

Where Codex is genuinely stronger than Claude Code today

Fresh greenfield scaffolds. "Build me a Flask + React skeleton with auth + DB + deploy config" — Codex tends to produce more conventional, idiomatic boilerplate faster.
Heavy algorithmic / math-shaped problems. Anything that needs disciplined symbolic reasoning (graph algorithms, complex math, query planning).
Multi-file Python or TypeScript refactors where the scope is clear and the codebase is medium-to-large. Codex's cloud mode parallelizes well.
Deep research reports with web search baked in. Operator + ChatGPT Pro's deep-research mode beats running a Claude subagent for "summarize the AI landscape this week" kind of work.
Cost when you're rate-shut. When Anthropic rate-limits, anything is faster than 2 hours of nothing.

Where Claude (Code in particular) is still ahead for YOUR stack

Your memory repo discipline. Claude Code's auto-memory hooks are what built the 50+ memory files indexed in MEMORY.md. ChatGPT memory is opaque and global; it won't write you a structured feedback_*.md with Why: + How to apply: linked to other entries.
The Prime Directives + Hard Rules. Claude has these baked into how it reads CLAUDE.md at session start. ChatGPT will need each session to re-prime against AGENTS.md or equivalent — works, but adds friction.
Knowing your communication style. The "no sales pitch / no hedging / surface frustration as signal / no nudges before 11 PM" instincts are calibrated. ChatGPT defaults to verbose, hedging, summary-heavy responses. You'll spend the first few days re-teaching tone.
Your specific tooling. chrome-devtools MCP attached to the right Chrome, the Apex SSH config, the Pixel adb paths, the deploy_once.ps1 + regen_pills.py pipeline, ntfy ASCII rules, SENDTO intent SMS recipe — all of it. None of it is in ChatGPT's session by default.
Knowing what NOT to suggest. Anti-Elon/Zuck routing, no political SMS, no upselling on Legacy Soil, the school/family privacy guardrails, the "don't reopen closed decisions" rule. ChatGPT will eagerly suggest Twitter/X integrations and McKinsey-tier enterprise framings unless you constantly correct.
Conversational presence. Honestly: this depends a lot on personal fit. Claude tends to push back; ChatGPT tends to comply. For a brain-dump-driven workflow, the pushback matters.

Tandem mode — is it feasible?

Yes, with one trick: treat ChatGPT as a contractor, not a co-worker.

Memory repo as the shared substrate. Both agents read MrB-Ed/claude-memory. Claude writes structured memories. ChatGPT can read them, but you should NOT expect ChatGPT to auto-extend the repo with its own observations — that breaks the structure. Give ChatGPT a one-line "if you want to leave a note, dump it to /tmp/chatgpt_observations.md and I'll fold it in when Claude is back."
Hand off via reports. When ChatGPT finishes something, have it write an HTML report to your reports site (you push, same as you'd do with anything from Cursor cloud) so Claude can pick up the thread on next session.
Don't make them edit the same files in parallel. If ChatGPT is working on tp3_spotify_pull.py, Claude shouldn't touch it. Worktrees if you must run both.
Different surfaces. ChatGPT's web browser (Operator) is its own Chrome instance, separate from your chrome-devtools MCP. They don't share cookies. Pixel adb work probably stays Claude-side (you've already invested in the wake-phrase + AutoVoice pipeline tuned to it).

What to give ChatGPT/Codex to work on RIGHT NOW

Sized for the next few hours while Claude is rate-shut. These are tasks where Codex's strengths match what's on the board AND that won't conflict with the Claude-side work currently in flight.

1. Heavy Spotify Phase 2 build — transcript + semantic search layer

Phase 1 (recently-played pull → Postgres) is built and about to start running. Phase 2 per the May 20 spec adds transcripts + semantic search across listening history. This is multi-file Python + DB work — exactly Codex cloud's lane.

Read /private/r/2026-05-20-spotify-history-spec.html and /private/r/2026-05-21-spotify-discovery.html for the Phase 2 spec.
Build the transcript fetcher (likely YouTube-API on linked podcast episodes; Spotify itself doesn't expose transcripts).
Wire pgvector embedding for transcripts (use the existing nomic-embed-text on Apex ollama).
Add a spotify_transcripts table + spotify_episode_chunks with vector column.
Ship a CLI that takes "Hey, what's that podcast where they discussed X" and returns matching episodes.

2. Bidet phone hardening — APK distribution + BYOK + free-hosted

The post-contest plan ([[project_bidetai_app_post_contest_strategy_2026-05-14]]) calls for three distribution paths: sideload APK, BYOK web, and a free-hosted version on Cloudflare Workers AI. Codex is well-suited to scaffold a Workers AI proxy with rate limiting + KV-backed user state. Pure code work, no live-stack dependencies.

3. Bidet Whisper fine-tune pipeline (Tier 2 long-arc)

Tangent #34 + #38 in your backlog. 22.5 hours of paired (audio, transcript) corpus already in Drive. Tier 2 spec: LoRA r=32 a=64 on Whisper-large-v3, weekend run on Apex's RTX 5060 Ti. Codex can lay down the training script + dataset loader + evaluation harness now so the actual training run is a one-command launch when hardware is upgraded.

4. Memory typing at ingest (RRF retrieval foundation)

Tangents #27 + #26: type each TP3 memory as fact | event | instruction | task at ingest, then implement RRF retrieval across pgvector cosine + Postgres FTS + metadata-key lookup + HyDE. Compounding gains on /omi/ask + dashboard recall. This is real algorithm + plumbing work — Codex's sweet spot.

5. Captain's-log routing fix + `kind` field

Tangent #25: 503 of 506 source=email_ingest rows are actually OMI conversation summaries misrouted via a Make scenario. Fix the Make scenario to point to omi_summary, add kind=article|transcript|email|note at ingest, backfill. Codex can produce the migration SQL + a Make blueprint patch + a backfill script.

6. Webhooks + HMAC for Apex → reports push

Tangent #28: replace any remaining polling pattern with HMAC-signed webhook pushes from Apex to reports.thebarnetts.info. Standard Webhooks library pattern. Greenfield code, well-suited to Codex.

7. Dreaming pattern port — auto-extract feedback memories from session logs

Tangent #43 (Anthropic's "Dreaming" pattern). Batch-read last 7 days of ~/.claude/projects/-home-g16/*.jsonl, run through Gemini or GPT to surface recurring corrections + working patterns, write candidates to memory/_candidates/<date>/, ntfy you Sun night, accept/edit/reject UI at memory.thebarnetts.info/candidates. Codex can build the candidate-generator + the UI.

8. Whisper-mark Tier 3 voice corpus cleanup (research)

The chunk-1 LoRA learned OMI's bad style (no caps, hallucinations). Need to either self-distill, hand-curate, or force-align via Bidet recordings. ChatGPT's deep-research mode could survey current state-of-the-art for cleaning ASR-label corpora before retraining. Output: a comparison report.

What to keep AWAY from ChatGPT/Codex (for now)

The Pixel adb / Tasker / AutoVoice work — already tuned to Claude Code's tooling. Migrating that mid-flight would risk breaking the "Computer" wake phrase pipeline you just rebuilt.
Smart-home migration — already in flight via Claude subagent; let it finish.
Reports site deploys — deploy_once.ps1 + regen_pills.py pipeline is Claude-Code-tested; let ChatGPT generate report HTML, but you (or Claude) run the deploy.
Anything touching the memory repo structure. Read-only for ChatGPT — your structured memory is too valuable to let an unfamiliar agent rewrite.
Cred rotation, account management, anything where the failure mode is irreversible. Claude has the institutional memory of what's been rotated, what's tied to what, how the lockdown tokens work, which Google account owns what. Don't let ChatGPT touch this until it's been primed for at least a session.

How to prime ChatGPT efficiently when you do switch over

Open ChatGPT Pro (you need Pro for Codex cloud + Operator).
Paste the handoff document from /private/r/2026-05-21-handoff-to-next-agent.html into the first message of a fresh chat. Tell it: "Read this top-to-bottom, then I'll give you a task."
Give it access to the memory repo via Codex GitHub integration (point it at MrB-Ed/claude-memory, read-only).
Pick ONE task from the list above and let it run. Don't task-stack until you've calibrated tone + competence on the first run.
When it finishes: have it write a results report to /tmp/chatgpt__.html in the format used by Mark's Reports. You (or Claude) deploy it when convenient.

My honest recommendation

Keep Claude Max as the spine. Add ChatGPT Pro ($200/mo) as the tandem agent. They cover each other's weaknesses, both work on the shared memory repo (with ChatGPT in read-only mode), and you stop being held hostage by either rate limit.

If you can only pick one, keep Claude for THIS specific stack — too much institutional context lives in how Claude reads your memory repo and your hard rules. But the right answer for the next year of how you work is "both."

One concrete first run to calibrate

If you're going to spin up ChatGPT Pro tonight, give it this single task and judge by how it goes:

"Build the Spotify Phase 2 transcript fetcher + pgvector embedding pipeline based on the spec in /private/r/2026-05-20-spotify-history-spec.html. Mirror the structure of tp3_spotify_pull.py. Output: tp3_spotify_transcripts.py + tp3_spotify_transcripts_ddl.sql + a test plan. Don't run it on Apex yet — Mark will review first."

That tests: spec comprehension, code style consistency with an existing repo file, willingness to NOT over-execute, and output quality. ~2 hour task in Codex cloud. Compare its first draft against what you remember Claude producing for Phase 1 last night.

2026-05-21 EOD. Pair report: Handoff to the next agent.