Same text for every sample so you can compare voices directly. Listen top-to-bottom, note which 2-3 you like best. Tell me your pick and that becomes the default for ntfy → Ray-Bans speak.
Beyond the 10 built-in voices, Supertonic exposes these per-call parameters — we can A/B any combination:
| Parameter | Range | What it changes |
|---|---|---|
total_steps | 5 (low) — 12 (high) | Generation quality. Default 8 (medium). Higher = better articulation, slower generation. For your stack, 8 is the sweet spot; 10-12 worth trying if you want maximum polish. |
speed | 0.7 (slow) — 2.0 (fast) | Playback speed without changing pitch. Current samples are 1.05 (slightly faster than natural). 0.95-1.1 sounds most natural; 1.2-1.5 useful for digest-style fast briefings. |
lang | 31 languages | English, Spanish, French, Arabic, Korean, German, Portuguese, Italian, etc. The voice can pronounce the same text in different language modes. "na" is language-agnostic (best for mixed-language). |
| Voice cloning | any audio sample | You can train a voice from a 5-10 second sample of YOUR voice. Then ntfy alerts speak in your voice. Per the Supertonic demo page — "Voice Builder | Cloning Demo". Deeper integration; worth exploring once the base pipeline is live. |
Tell me three things:
Once you pick, I lock that into the Tasker integration. The current 8 actions in your TP3 Notification task end with a Say that uses GoogleTTS. I'll replace it with an HTTP Request to the Supertonic endpoint + Music Play on the returned WAV. Same triggers, better voice, runs on G16 today + migrates to Apex post-Saturday.
Supertonic 3 v1.3.1 · HTTP serve at 192.168.1.185:7788 · survives shell exit via setsid nohup