AI Radar — Mark Barnett — 2026-05-09
Coverage window: 2026-05-02 → 2026-05-09. Bidet-phone contest deadline: 2026-05-24 (15 days out).
Top 3 actions this week
-
Lock the bidet-phone submission shape now, not next weekend. DEV's Gemma 4 Challenge ($3K, deadline 5/24) is the right venue; Kaggle's Good Hackathon ($200K, deadline 5/18) is a stretch you should not try to retrofit into. DEV requires a public write-up + working repo — start the README outline tonight while transcription is fresh on Pixel 8 Pro. Source: dev.to challenge page, Kaggle competition page.
-
Audit Gemma 4 E4B's audio encoder before the contest ends — it may eat Whisper-tiny. E4B has a built-in audio encoder (50% smaller than 3N's, 40ms frames) that produces reasoned text, not just transcripts. For a brain-dump app this is one less model to ship. Spend 2 hours: feed it a Pixel-recorded sample, compare output and latency vs. your current Whisper-tiny+Gemma chain. If E4B-audio holds up, drop Whisper, halve your APK, and write the contest narrative around "single-model on-device." If it doesn't, you've got a defensible "why we kept Whisper" paragraph. Source: MindStudio Gemma 4 audio encoder breakdown, Gemma 4 model card.
-
Pin Claude Code to ≤ v2.1.135 OR test 2.1.138 in a throwaway worktree before merging anything important. Anthropic shipped 2.1.136 → 2.1.138 between 5/7 and 5/9; 2.1.137 specifically fixed a Windows VS Code activation regression, and
worktree.baseRef(5/7) changed default branching behavior. You run Claude Code constantly on G16/Apex — a silent default change in worktree base will bite. Source: Claude Code changelog, releasebot.io.
Ranked candidates
| # | Item | Score | Why-it-matters-for-Mark | Integration cost |
|---|---|---|---|---|
| 1 | Gemma 4 E4B built-in audio encoder | 9 | Could collapse bidet-phone's two-model pipeline into one. Direct to your live contest build. | 2-3h spike, reversible |
| 2 | DEV Gemma 4 Challenge logistics ($3K, 5/24) | 9 | This is the contest you're already in. Write-up format and judging rubric drive the submission shape. | 0h research, 4-6h writing across the week |
| 3 | Claude Code 2.1.136-138 changes (worktree.baseRef, MCP /clear bug, autoMode hard_deny) | 8 | You hit this daily on G16+Apex. worktree.baseRef=fresh vs head will silently change which commit your subagents branch from. |
30 min — read changelog, set explicit value in settings.json |
| 4 | Claude Opus 4.7 + 1M context at standard pricing | 7 | $5/$25 per MTok unchanged BUT new tokenizer uses ~1.0-1.35x more tokens per task. Your $140/mo budget could drift up silently. New effort + task budgets knobs are real cost levers. |
1h: re-benchmark a typical session, decide on default effort tier |
| 5 | LiteRT-LM production GA + Qualcomm/MediaTek NPU support | 7 | Pixel 8 Pro has a Tensor G3 NPU you're not touching today. LiteRT-LM is the official path to NPU acceleration for Gemma 4 on Android. Future Bidet-phone v2 lever, not contest-week. | High — re-architect inference layer. Park for post-contest. |
| 6 | whisper.cpp commit c81b2dab (5/7): Ruby GVL-free transcribe, Windows fixes | 6 | You don't use the Ruby bindings, but the Windows build improvements matter for any Apex-side whisper.cpp work. Low-priority refresh. | 15 min — git pull && rebuild if you use whisper.cpp directly |
| 7 | Anthropic "Dreaming" research preview (agents self-improve overnight from past sessions) | 5 | Same conceptual territory as your TP3 memory pipeline. Worth a 30-min read post-contest to see whether it overlaps your reflection layer. Not GA. | 0 right now — gated access |
| 8 | OpenAI GPT-5.5 Instant (5/5) + ads-in-ChatGPT pilot (5/7) | 3 | You're not on OpenAI. Hallucination drop (-52.5% vs 5.3) is industry signal, not action. Ads launch is a "watch from a distance" item. | 0 |
Cut with reason
- DeepMind × EVE Online partnership (5/6). Cool research story, zero developer surface. Skip.
- SpaceX/Colossus 220K-GPU deal. Capacity press release; doesn't change your day. Skip.
- EU Digital Omnibus on AI provisional agreement (5/7). Enforcement starts 2027-2028. Not actionable in 2026-05.
- Five Eyes "Careful Adoption of Agentic AI" guidance. Government posture doc, not a tool. File mentally; don't read.
- Hugging Face trending: VibeVoice, MinerU2.5, EverMemOS, UniVidX. None map to your stack right now. EverMemOS (self-organizing LLM memory) is closest to TP3 — re-check in 30 days when there's a reference implementation, not just a paper.
- ChatGPT for Excel/Sheets global rollout. You don't live in spreadsheets. Skip.
- CAISI/NIST evaluating frontier models pre-release. Compliance news. Skip.
- Hacker News AI-attacks coverage (Claude Code extortion case, Mexico tax data breach via Claude+ChatGPT). Real but not your threat model — your stack is private, single-user, and Tailscale-fronted. Note the existence; don't chase.
- Wan2.2-TI2V-5B and other text-to-video. Out of scope for your projects.
- Anthropic Code Review / CI auto-fix / Routines. All look genuinely useful but every one is a 2-4h evaluation; queueing them until after 5/24 is the right call given the contest deadline. Re-surface 5/25.
Sources scanned
- Anthropic news + Claude Code changelog (releasebot.io/anthropic, claudefa.st changelog, code.claude.com/docs/changelog)
- Simon Willison live blog of Code w/ Claude 2026 (5/6)
- Claude Opus 4.7 launch + pricing pages (anthropic.com/news/claude-opus-4-7, platform.claude.com pricing, llm-stats.com, Caylent deep-dive)
- Google Gemma 4 launch + model card (blog.google, ai.google.dev/gemma, deepmind.google/models/gemma)
- Hugging Face Gemma 4 blog post (huggingface.co/blog/gemma4) for tooling matrix
- Android Developers Blog: Gemma 4 on Android (android-developers.googleblog.com)
- MindStudio: Gemma 4 E2B/E4B audio encoder analysis
- Kaggle Gemma 4 Good Hackathon page + DEV Gemma 4 Challenge page
- LiteRT/LiteRT-LM: developers.googleblog.com, github.com/google-ai-edge/LiteRT-LM, infoq.com (5/2026)
- whisper.cpp releases page (github.com/ggml-org/whisper.cpp/releases) — latest commit c81b2dab 2026-05-07
- OpenAI news (openai.com/index/gpt-5-5-instant, releasebot.io/openai, TechCrunch 5/5)
- Hacker News front pages 5/2 and 5/7; rockcybermusings AI security weekly 5/1-5/7
- Bloomberg + 9to5Google for DeepMind/EVE; nextgov + CNBC for CAISI testing program
Cost of this run
8 WebSearch calls + 2 WebFetch calls. Estimated Anthropic spend: ~$0.35-0.55 in input/output tokens (Opus 4.7, ~30K input / ~2K output equivalent). Comfortably under the $1/run cap.