Listen to all 5 in order. The point is to evaluate quality on the exact kinds of strings the Tasker Say pipeline handles today: numbers, dates, acronyms (TP3, OMI, HTTP), punctuation, ntfy alert phrasing, "Computer answer" responses. Then tell me if you want me to replace GoogleTTS with Supertonic in the Ray-Bans speak chain (post-Saturday hardware upgrade).
1. ntfy alert (Bidet down style)
01_bidet_alert · 8.77s audio · 756 KB
"Bidet unreachable. HTTP zero zero zero, time 25 seconds. Likely transient — CF tunnel or Whisper job blocking."
"You have three calendar events today. First one at 10 AM with William about Priority Landscape, then lunch at 12:30 with Kim, and parent-teacher conference at 3 PM."
3. Numbers + dates + dollar amounts (the worst case for most TTS)
03_numbers_dates · 12.35s audio · 1.0 MB
"B and H package arrives April 22, 2026. Order number one one two nine two one eight zero two three. Total: one thousand five dollars and ninety three cents."
4. Acronyms + semicolons + your tech-speak
04_punctuation_acronyms · 10.31s audio · 888 KB
"TP3 ingest stalled. Newest row is 63 minutes old. Watcher tipping at threshold; OMI webhook flaky, omi-api-poll reliable."
5. Short burst (quick-fire alert)
05_short_burst · 4.63s audio · 402 KB
"Spotify pulled 50 tracks. Cron next run at 3 AM."
Performance on G16 CPU (no GPU):
Real-time factor (RTF): 0.22 - 0.36x — meaning audio generates in 22-36% of its playback duration. Anything under 1.0 is faster than real-time. 0.22 is genuinely impressive on CPU.
49 seconds of total audio generated in ~12 seconds wall time. Including model load (one-time, 13.7s).
Memory: ONNX runtime, modest footprint. Post-Saturday upgrade with 64GB RAM, this runs alongside Postgres + ollama + everything else with room to spare.
Disk footprint: model + deps ~200 MB.
Verdict you're being asked to make
After listening, which is it?
YES — sounds noticeably better than GoogleTTS → I deploy Supertonic as an HTTP endpoint on Apex post-Saturday hardware, modify Tasker Say to call it. Ray-Bans audio quality steps up across the board (ntfy speak, Computer answers, every spoken alert).
MEH — about the same as GoogleTTS → not worth the integration work right now. Bank the install for future use (Bidet AI standalone offline pipeline still benefits, but not urgent).
NO — sounds worse → drop it, GoogleTTS stays the default.
Three voice styles available beyond M1 if M1 doesn't land for you: M2, F1, F2. Easy A/B once you say "try a different voice."