Overview
Toucan TTS is our flagship text-to-speech model designed for professional applications. Built for speed and accuracy, Toucan delivers high-quality speech synthesis with precise prosody control and excellent multilingual capabilities.
Key Features
- High Performance: Optimized for fast inference with minimal latency
- Prosody Control: Fine-grained control over pitch, duration, and energy
- Multilingual Support: Supports multiple languages with native pronunciation
- Professional Quality: Studio-grade audio output suitable for commercial use
- Flexible Integration: Easy to integrate via REST API or SDK
Technical Specifications
- Model Architecture: Advanced neural vocoder with attention mechanisms
- Sample Rate: 22.05 kHz (high-fidelity audio)
- Latency: Sub-second inference for real-time applications
- Languages: English, Spanish, French, German, Italian, Portuguese, and more
- Pricing: 1 quota per character (cost-effective for high-volume usage)
Use Cases
Toucan TTS excels in scenarios requiring fast, reliable speech synthesis:
- Real-time voice assistants and chatbots
- Interactive voice response (IVR) systems
- E-learning and educational content
- Accessibility applications
- Content creation and media production
- Gaming and entertainment
Advanced Capabilities
What sets Toucan apart is its sophisticated prosody control system. You can fine-tune:
- Pitch Patterns: Control the melodic contour of speech
- Speaking Rate: Adjust the speed of delivery for different contexts
- Emphasis: Highlight important words or phrases
- Pause Patterns: Control timing and rhythm for natural-sounding speech
Integration
Toucan TTS integrates seamlessly with our platform and can be accessed through:
- RESTful API endpoints
- WebSocket connections for streaming
- JavaScript/TypeScript SDK
- Batch processing capabilities