Models

AI Voice Models: From Universal Languages to Premium Quality

Discover Toucan and Kokoro, our production-ready TTS models offering 7,000+ languages and exceptional audio quality, plus upcoming models.

Breaking Down Language Barriers

VibeTTS is powered by Toucan, a state-of-the-art multilingual text-to-speech model developed by the University of Stuttgart. With support for over 7,000 languages, Toucan enables you to reach audiences across the globe - from major world languages to regional dialects and even endangered languages.

Born from frustration with limited AI dubbing controls (learn more about our story), VibeTTS was created to unlock the full potential of Toucan - giving you the advanced prosody controls and language coverage that was previously unavailable.

Why Toucan?

FeatureCapability
Language Support7,000+ languages including dialects and endangered languages
Voice CloningClone any voice from a short audio sample
Prosody ControlFine-tune pitch, energy, duration, and more
Phoneme EditingVisual interface for precise speech control

Advanced Prosody Control

What makes Toucan truly special isn't just its incredible language range, but its advanced prosody control capabilities. You can fine-tune the pitch, energy, and duration of every sound - giving you the kind of control that was previously only available to professional voice directors.

Imagine being able to direct a voice actor with the precision of a professional voice director - adjusting not just how fast they speak, but the creativity in their delivery, the energy they put into each word, even the length of natural pauses. That's exactly what VibeTTS gives you.

Key Prosody Features

  • Prosody Creativity: Control how expressive and varied the speech patterns are
  • Duration Scaling: Adjust overall speaking pace without affecting naturalness
  • Pitch Variance: Control the melodic variation in speech
  • Energy Variance: Adjust emphasis and intensity patterns
  • Pause Duration: Fine-tune natural pauses between phrases
  • Loudness Control: Set the overall volume level

Voice Cloning

Toucan's voice cloning capabilities allow you to create custom voices from short audio samples. Simply upload a recording and Toucan will learn the voice characteristics, enabling you to generate new speech in that voice across any of its 7,000+ supported languages.

Voice Cloning Use Cases:

  • Create consistent brand voices for your content
  • Preserve voices for accessibility or memorial purposes
  • Generate multilingual content in a single consistent voice
  • Build custom voice personas for applications

Real Performance: What to Expect

Toucan is production-ready and delivers professional results. Here's what you can expect:

Generation Speed: Typically 1.5-2 seconds per inference thanks to our auto-inference feature that automatically regenerates audio when you make changes.

Voice Control: Granular control over pitch, energy, and duration of every speech sound through our visual phoneme editor.

Language Coverage: 7,000+ language support covers major world languages, regional dialects, and endangered languages with authentic pronunciation patterns.

Quality Consistency: Stable quality across different text lengths and content types, with natural breathing patterns incorporated into longer passages.

Platform Features

What's Available Now:

  • 6 speech parameters for global prosody control
  • 6 voice parameters for voice characteristic adjustment
  • Phoneme-level editing with visual interface
  • Voice cloning from audio samples
  • Auto-inference with ~2 second generation time
  • 7,000+ language support

Coming Soon:

  • API access for developers
  • Additional platform features based on user feedback

Getting Started

Ready to experience the power of advanced TTS control? Here's how:

  1. Enter Your Text: Type or paste the text you want to convert to speech
  2. Choose a Voice: Select from existing voices or clone your own
  3. Fine-tune Prosody: Use the visual interface to adjust pitch, energy, and duration
  4. Generate and Download: Get your audio in seconds

Start creating speech you want, learn more about our platform features, explore real-world use cases, or discover the full range of voice options.

Toucan is an open-source Apache-2.0 project: IMS Toucan on GitHub

Related Pages

Frequently Asked Questions

Ready to Try Our Models?

Experience the power of advanced TTS technology with our professional-grade models.