← Dashboard

Supertonic 3 — listen test

2026-05-22 · 5 samples generated on G16 CPU using your typical text patterns

Voice: M1 (default male English) · Quality steps: 8 (medium) · Speed: 1.05x · Sample rate: 44100 Hz mono

Listen to all 5 in order. The point is to evaluate quality on the exact kinds of strings the Tasker Say pipeline handles today: numbers, dates, acronyms (TP3, OMI, HTTP), punctuation, ntfy alert phrasing, "Computer answer" responses. Then tell me if you want me to replace GoogleTTS with Supertonic in the Ray-Bans speak chain (post-Saturday hardware upgrade).

1. ntfy alert (Bidet down style)

01_bidet_alert · 8.77s audio · 756 KB
"Bidet unreachable. HTTP zero zero zero, time 25 seconds. Likely transient — CF tunnel or Whisper job blocking."

2. "Computer" answer-style response (long sentence, multiple names + times)

02_computer_ask_response · 12.85s audio · 1.1 MB
"You have three calendar events today. First one at 10 AM with William about Priority Landscape, then lunch at 12:30 with Kim, and parent-teacher conference at 3 PM."

3. Numbers + dates + dollar amounts (the worst case for most TTS)

03_numbers_dates · 12.35s audio · 1.0 MB
"B and H package arrives April 22, 2026. Order number one one two nine two one eight zero two three. Total: one thousand five dollars and ninety three cents."

4. Acronyms + semicolons + your tech-speak

04_punctuation_acronyms · 10.31s audio · 888 KB
"TP3 ingest stalled. Newest row is 63 minutes old. Watcher tipping at threshold; OMI webhook flaky, omi-api-poll reliable."

5. Short burst (quick-fire alert)

05_short_burst · 4.63s audio · 402 KB
"Spotify pulled 50 tracks. Cron next run at 3 AM."
Performance on G16 CPU (no GPU):

Verdict you're being asked to make

After listening, which is it?

Three voice styles available beyond M1 if M1 doesn't land for you: M2, F1, F2. Easy A/B once you say "try a different voice."

Supertonic 3 v1.3.1 · github.com/supertone-inc/supertonic · samples generated on G16 / WSL Ubuntu / CPU only