AI Radar — 2026-05-08
Scored for Mark Barnett: cost-conscious, local-first, Apex (16 GB VRAM / 13.8 GB RAM), TP3 stack, Bidet AI, teaching workflow.
Actions I recommend Mark take this week (top 3, max)
-
Enjoy the doubled Claude Code limits — no action required — Anthropic's May 6 update doubled the 5-hour rate limit on Pro/Max/Team/Enterprise and removed the peak-hours throttle. API Tier 1 Opus limits went up 1500%/900% input/output. Backed by 220K+ SpaceX Colossus GPUs going live this month. Source: anthropic.com/news/higher-limits-spacex
-
Test Gemma 4 26B-A4B on Apex Ollama — Google's MoE model has 26B total params but only ~4B activated per token (same per-token VRAM cost as a 4B model, ~6–8 GB). It's outperforming frontier models on multimodal tasks and trending at 8.73M downloads. Likely fits Apex with headroom. Command:
ollama pull gemma4:26b-a4b-it. Source: huggingface.co/google/gemma-4-26B-A4B-it-assistant -
Skim the MCP 2026 roadmap — SEP (Spec Enhancement Proposals) process is now open. Transport scalability and agent-to-agent communication are the next two milestones, both directly relevant to TP3's MCP architecture. No action needed now, but watch for the transport upgrade — it may simplify the SSE vs stdio decision. Source: blog.modelcontextprotocol.io/posts/2026-mcp-roadmap
Ranked candidates
| # | Candidate | Source | Score | Why it matters for Mark | Integration cost |
|---|---|---|---|---|---|
| 1 | Claude Code rate limits 2× | Anthropic (May 6) | 21 | Direct: Pro/Max get 2× 5-hr block, peak throttle gone, Opus API +1500% token/min. Same price, more throughput. SpaceX compute deal removes scarcity risk. | Zero — already live |
| 2 | Gemma 4 26B-A4B MoE | HuggingFace/Google (May 5) | 21 | MoE: only 4B params active per token → fits Apex 16GB VRAM at Q4. Multimodal (vision+text). 8.73M downloads = community validated. Prior radar recommended 4B; this is the frontier jump. | Low — ollama pull, single test run |
| 3 | MCP 2026 roadmap + Jama Connect MCP launch | MCP blog + GlobeNewswire (May 4) | 17 | TP3 runs 4 MCP servers. Transport scalability SEP is next on roadmap. Jama as first enterprise MCP adopter signals ecosystem maturing — more MCP-compatible tools incoming. | Info-only this week |
| 4 | DeepSeek V4-Flash (MIT, Ollama) | HuggingFace (April 24, trending) | 15 | 158B MoE, MIT license, 1M context, $0.14/M tokens via API or local with 24GB+ VRAM. V4-Flash outperforms V3 significantly. Apex's 16GB VRAM is ~8GB short for comfortable local run — wait for further quantization or test with CPU offload. | Medium — VRAM constrained on Apex |
| 5 | Kimi K2.6 (Moonshot AI) | Ollama cloud, latent.space (May 1) | 13 | Open weights (HuggingFace), SOTA coding model rivaling Opus 4.6 on benchmarks, 256K context. Weights are 610GB full-precision; local needs heavy quantization. Ollama listing is cloud-only. Not local-first today. | High — not locally runnable on Apex yet |
| 6 | OpenAI Voice API — reasoning + TTS | OpenAI (May 7) | 13 | New realtime voice models with reasoning + translation baked in. Could enhance the Ray-Bans TTS pipeline beyond the current Tasker /brief chain. Costs money (API). Worth watching for pricing before acting. |
Low code-wise, but costs $ — needs Mark approval |
| 7 | GPT-5.5 Instant (new ChatGPT default) | OpenAI (May 5) | 11 | 52.5% fewer hallucinated claims on high-stakes prompts vs GPT-5.3, AIME 81.2 (up from 65.4). API pricing $5/$30 per M tokens. More expensive than Claude; benchmark is self-reported — no independent confirmation yet. No TP3 integration path. | N/A — no action warranted |
Cut with reason
- Anthropic enterprise AI company (Blackstone/Goldman/H&F) — Corporate investment news, zero impact on Mark's stack or cost.
- OpenAI ads in ChatGPT — Consumer feature, not in Mark's workflow.
- OpenAI Codex safety post — Internal policy/ops. No actionable signal for Mark.
- MiMo-V2.5-Pro (Xiaomi, 1T params) — Dropped this week, zero Ollama support yet, no quantized builds, no community testing. Monitor next week.
- Sulphur-2-base (text-to-video) — Not in any current use case for Mark.
- Anthropic finance agents — Enterprise vertical, not relevant.
- GPT-5.5-Cyber (trusted access for security) — Security research tool behind verified access program. Not applicable.
Sources scanned
- Gmail newsletters: Search returned auth scope error — Gmail MCP requires re-auth with broader read scope. Gap: newsletter scan skipped this week. Recommend fixing Gmail MCP scope before next run.
- Anthropic blog: 3 articles published May 4–6, 2026. All fetched successfully.
- OpenAI blog RSS: 20 articles published May 4–8, 2026. Fetched and parsed.
- Google DeepMind blog: Page loaded but no posts dated after May 1 were surfaced (most recent visible: April 2026 Gemma 4 announcement). Likely cache/CDN lag — counted as partial gap.
- HuggingFace trending: Top 15 models by trending score fetched. 8 updated within last 3 days.
- HN Algolia API: Top AI stories pulled. Results skewed toward older dates (May 2025 items surfacing due to index lag) — used web search to verify recency.
- WebSearch supplemental: 5 targeted searches for DeepSeek V4, Kimi K2.6, Claude limits, MCP roadmap, GPT-5.5 pricing.
Cost of this run
- Input tokens: ~55,000 (context + fetched pages + search results)
- Output tokens: ~1,800
- Estimated cost: ~$0.19 (Sonnet 4.6 at $3/$15 per M tokens)
Generated 2026-05-08 19:06 by tp3_scripts/ai_radar/run_radar.ps1.