Voice Platform

Your voice. Your hardware. Nobody else’s business.

halo-ai’s voice platform handles synthesis, recognition, and cloning – all running locally, all private by default. Thirty seconds of reference audio is enough to build a voice model. That model never leaves your machine.

Core Stack

Component	Engine	Purpose
Text-to-Speech	Kokoro TTS	High-fidelity voice synthesis
Speech-to-Text	Whisper	Transcription and real-time recognition
Voice Understanding	Voxtral (Mistral)	Multimodal voice + text reasoning
Voice Cloning	30-second capture	Custom voice model from minimal audio

Voxtral — Mistral Voice Weights

We run Voxtral, Mistral’s multimodal voice model. It understands audio natively — not just transcription, but comprehension. Feed it a voice recording and it reasons about what was said, the tone, the intent. Combined with Whisper for raw STT and Kokoro for output, the full pipeline is:

Voice in (Whisper) → Understanding (Voxtral) → Reasoning (Qwen3-30B) → Voice out (Kokoro)

All local. All on AMD Strix Halo. No cloud API. The weights run on the same hardware as everything else — 128GB unified memory means the voice model and the LLM coexist without swapping.

Voice-as-a-Service

The voice platform powers multiple services across the halo-ai ecosystem:

Audiobooks

Feed it a manuscript. Pick a voice – yours, a clone, or one of the built-in models. Get a produced audiobook with chapter markers, consistent pacing, and natural inflection. Amp handles the mastering.

Music Production

Voice models integrated into the music pipeline. Sing lead vocals without singing. Layer harmonies from a single voice source. The Downcomers – halo-ai’s resident band – use cloned vocals for every track.

Game Voices

Dynamic character dialog generated in real time. Dealer writes the lines, the voice platform speaks them. Every NPC has a voice. Every run sounds different.

Live Streaming Co-Host

A real-time voice companion for streams. Responds to chat, comments on gameplay, and maintains a consistent persona throughout the broadcast. Low latency. Natural cadence.

The Downcomers

halo-ai’s AI band. Heavy blues, bagpipes meeting electric guitar, AC/DC crossed with Led Zeppelin. All vocals are cloned. All instruments are synthesized or sampled. The music is real. The band is not.

Memorial Voice Cloning

Preserve a voice that matters to you. Thirty seconds of audio from a phone call, a voicemail, a home video. The model captures tone, cadence, and character. It stays on your hardware, encrypted, for as long as you want it there.

Privacy

The voice model never uploads. Never phones home. Never trains on your data for someone else’s benefit. What you record stays recorded on your drives and nowhere else. This is not negotiable.

Designed and built by the architect.