Voices

A Universe of Voices: Finding and Crafting the Perfect Voice

Explore the vast possibilities of voice generation. Learn how to select from pre-made voices, clone existing voices, or fine-tune every aspect of a voice with our powerful models like Toucan, Kokoro, Orpheus, and Chatterbox.

Your Voice, Your Way: A Guide to Voice Generation

The right voice can make or break your audio content. It's the difference between a flat, robotic delivery and an engaging, human-like performance. We provide a comprehensive suite of tools to help you find, create, and customize the perfect voice for any application.

This guide explores the different ways you can generate voices with our platform, from picking a ready-made voice to cloning your own, and dives deep into the advanced controls that put you in the director's chair.

Quick Guide to Voice Generation

ApproachBest ForModelsCustomization Level
Pre-made VoicesQuick start, high qualityKokoro, OrpheusLow
Voice CloningPersonalization, brandingToucan, ChatterboxHigh
Fine-tuning ProsodyExpressiveness, characterToucanVery High

In-Depth Approaches to Voice Creation

1. Selecting a Pre-made Professional Voice

The fastest way to get started is by choosing from our library of high-quality, pre-made voices. These voices have been crafted by experts and are ready to use out-of-the-box.

  • Kokoro: Offers the highest audio fidelity and most natural-sounding voices across 9 languages. If your priority is sheer quality and a pleasant listening experience, Kokoro is your best bet.
  • Orpheus: Provides a selection of expressive voices across 8 languages. With built-in emotional capabilities, Orpheus is perfect for storytelling and character work.

This approach is ideal when you need a professional voice quickly without the need for deep customization.

2. Cloning a Voice

Voice cloning allows you to create a digital replica of any voice from a short audio sample. This is the ultimate tool for personalization and brand consistency.

  • Chatterbox: Our specialist for high-fidelity voice cloning in English. It excels at capturing the unique characteristics of a voice, creating a nearly indistinguishable digital twin.
  • Toucan: Not only does Toucan support a vast number of languages, but it also has powerful voice cloning capabilities. You can provide a reference audio file, and Toucan will adapt its output to match the speaker's voice. This is incredibly powerful for localizing content while maintaining a consistent brand voice.

How it works: Simply upload a clean audio sample of the desired voice, and the model will learn its unique acoustic features. The result is a new, custom voice you can use for any text-to-speech task.

3. Creating Infinite Voices with Toucan

For unparalleled variety, the Toucan model can generate a unique voice from a "seed"—a random number that acts as a starting point. Since every seed creates a different voice, this gives you access to a virtually infinite library of voices. This is perfect for applications requiring a large cast of distinct characters, such as video games or animated content, without needing to source new voice actors.

4. Advanced Voice Crafting with Toucan

For those who want complete control, Toucan offers an unparalleled level of customization through prosody control. Prosody is the "music" of speech—the rhythm, pitch, stress, and intonation.

Every single inference in this app, regardless of the selected model, is processed by our Toucan model. The Toucan model extracts prosody from the audio, which means you can modify the prosody of any inference.

With Toucan, you can go beyond just the voice and edit the performance itself.

  • Pitch Control: Adjust the baseline pitch of the voice to make it higher or lower.
  • Duration Control: Lengthen or shorten phonemes to change the speaking rate and rhythm.
  • Energy Control: Modify the intensity or "energy" of the speech to create a more dynamic or subdued performance.

This fine-grained control allows you to craft a unique vocal identity and deliver speech with the exact expressive quality you envision. You can essentially "direct" the synthetic voice actor.

Comparing Voice Capabilities Across Models

ModelVoice SelectionVoice CloningFine-Grained Control
ToucanVia cloningYes (7000+ languages)Yes (Pitch, Duration, Energy)
KokoroLibrary of voicesNoNo
OrpheusLibrary of voicesNoYes (Emotion Tags)
ChatterboxVia cloningYes (English, high-fidelity)No

How to Choose Your Approach

  • For speed and quality... pick a pre-made voice from Kokoro or Orpheus.
  • To use your own voice or a specific person's voice... use voice cloning with Chatterbox (English) or Toucan (multilingual).
  • For maximum expressiveness and custom voice design... use Toucan's advanced prosody controls.

No matter your project, our platform provides the flexibility to find and craft the perfect voice.

Related Pages

Frequently Asked Questions

Explore Our Voice Library

Discover hundreds of natural-sounding voices for your next project.