Your Voice, Your Way: A Guide to Voice Generation
The right voice can make or break your audio content. It's the difference between a flat, robotic delivery and an engaging, human-like performance. We provide a comprehensive suite of tools to help you find, create, and customize the perfect voice for any application.
This guide explores the different ways you can generate voices with our platform, from picking a ready-made voice to cloning your own, and dives deep into the advanced controls that put you in the director's chair.
Quick Guide to Voice Generation
Approach | Best For | Models | Customization Level |
---|---|---|---|
Pre-made Voices | Quick start, high quality | Kokoro, Orpheus | Low |
Voice Cloning | Personalization, branding | Toucan, Chatterbox | High |
Fine-tuning Prosody | Expressiveness, character | Toucan | Very High |
In-Depth Approaches to Voice Creation
1. Selecting a Pre-made Professional Voice
The fastest way to get started is by choosing from our library of high-quality, pre-made voices. These voices have been crafted by experts and are ready to use out-of-the-box.
- Kokoro: Offers the highest audio fidelity and most natural-sounding voices across 9 languages. If your priority is sheer quality and a pleasant listening experience, Kokoro is your best bet.
- Orpheus: Provides a selection of expressive voices across 8 languages. With built-in emotional capabilities, Orpheus is perfect for storytelling and character work.
This approach is ideal when you need a professional voice quickly without the need for deep customization.
2. Cloning a Voice
Voice cloning allows you to create a digital replica of any voice from a short audio sample. This is the ultimate tool for personalization and brand consistency.
- Chatterbox: Our specialist for high-fidelity voice cloning in English. It excels at capturing the unique characteristics of a voice, creating a nearly indistinguishable digital twin.
- Toucan: Not only does Toucan support a vast number of languages, but it also has powerful voice cloning capabilities. You can provide a reference audio file, and Toucan will adapt its output to match the speaker's voice. This is incredibly powerful for localizing content while maintaining a consistent brand voice.
How it works: Simply upload a clean audio sample of the desired voice, and the model will learn its unique acoustic features. The result is a new, custom voice you can use for any text-to-speech task.
3. Creating Infinite Voices with Toucan
For unparalleled variety, the Toucan model can generate a unique voice from a "seed"—a random number that acts as a starting point. Since every seed creates a different voice, this gives you access to a virtually infinite library of voices. This is perfect for applications requiring a large cast of distinct characters, such as video games or animated content, without needing to source new voice actors.
4. Advanced Voice Crafting with Toucan
For those who want complete control, Toucan offers an unparalleled level of customization through prosody control. Prosody is the "music" of speech—the rhythm, pitch, stress, and intonation.
Every single inference in this app, regardless of the selected model, is processed by our Toucan model. The Toucan model extracts prosody from the audio, which means you can modify the prosody of any inference.
With Toucan, you can go beyond just the voice and edit the performance itself.
- Pitch Control: Adjust the baseline pitch of the voice to make it higher or lower.
- Duration Control: Lengthen or shorten phonemes to change the speaking rate and rhythm.
- Energy Control: Modify the intensity or "energy" of the speech to create a more dynamic or subdued performance.
This fine-grained control allows you to craft a unique vocal identity and deliver speech with the exact expressive quality you envision. You can essentially "direct" the synthetic voice actor.
Comparing Voice Capabilities Across Models
Model | Voice Selection | Voice Cloning | Fine-Grained Control |
---|---|---|---|
Toucan | Via cloning | Yes (7000+ languages) | Yes (Pitch, Duration, Energy) |
Kokoro | Library of voices | No | No |
Orpheus | Library of voices | No | Yes (Emotion Tags) |
Chatterbox | Via cloning | Yes (English, high-fidelity) | No |
How to Choose Your Approach
- For speed and quality... pick a pre-made voice from Kokoro or Orpheus.
- To use your own voice or a specific person's voice... use voice cloning with Chatterbox (English) or Toucan (multilingual).
- For maximum expressiveness and custom voice design... use Toucan's advanced prosody controls.
No matter your project, our platform provides the flexibility to find and craft the perfect voice.