Move beyond standard synthesis. Our High-Definition (HD) Generative Tier offers voices that breathe, pause, and emote naturally.
Context-Aware Delivery: The engine analyzes the text to understand if it should whisper a secret, shout a warning, or deliver news with authority.
Natural Disfluencies: Capable of inserting realistic human elements like "ums," "uhs," and breaths for conversational agents that sound genuinely spontaneous.
Affective Intelligence: Dynamically adjusts emotional weight (joy, sorrow, urgency) based on the sentiment of your script.
Stop relying on rigid code tags. Control the voice using natural language prompts.
Prompt-to-Speech: Simply tell the API: *"Read this like a tired storybook narrator"* or *"Speak this quickly and excitedly like a sports commentator."*
Granular Pacing: Fine-tune the rhythm of speech down to the millisecond. Stretch pauses for dramatic effect or speed up specific phrases to mimic fast-paced banter.
Generate complex audio scenes with a single API call.
Seamless Turn-Taking: Simulate podcasts, interviews, or customer service roleplays where multiple distinct voices interact.
Unified Context: The system maintains the tone and flow of the conversation across different speakers, ensuring no jarring transitions.
Our infrastructure is designed for global deployment, ensuring your application speaks your customers' language—literally.
| Feature | Specification |
|---|---|
| Voice Portfolio | Access 380+ distinct voice personas across all tiers. |
| Language Coverage | Native support for 80+ languages and variants (locales). |
| Regional Accents | Deep support for regional nuances (e.g., 5+ variants of English, 3+ variants of Spanish and French). |
| Studio Tier | specialized voices recorded by professional voice actors for long-form content (audiobooks/news) to eliminate listener fatigue. |
Built for developers who demand reliability and flexibility.
Ultra-Low Latency: "Flash" model architecture delivers audio in <300ms, enabling real-time, interruptible voice conversations for AI agents.
High-Fidelity Audio:
Studio Quality: Up to 48 kHz sample rate.
Compressed Output: (MP3) for post-production.
Input Flexibility: Accepts Plain Text and Natural Language Prompts.
Bidirectional Streaming: Playback begins instantly while the rest of the sentence is still being generated.
Interactive AI Agents: Power customer support bots that sound empathetic and human, not robotic.
Content Production: Automate audiobook narration, podcast creation, and video dubbing at a fraction of the cost of a studio.
EdTech & E-Learning: Generate dynamic language learning lessons with perfect native pronunciation in 80+ languages.
Gaming & VR: Create dynamic NPCs (Non-Player Characters) that can generate unique dialogue on the fly without pre-recorded lines.
Get list of voices
{
"data": [
{
"gender": "FEMALE",
"language_code": "en-US",
"language_name": "English (US)",
"type": "Premium",
"voice_id": "en-US-News-L"
}
],
"message": "success",
"success": true
}
curl --location --request GET 'https://zylalabs.com/api/11558/ultra+text-to-speech+api/21834/list+of+voices' --header 'Authorization: Bearer YOUR_API_KEY'
Generate text-to-speech
Create text-to-speech - Endpoint Features
| Object | Description |
|---|---|
Request Body |
[Required] Json |
{"data":"https://s3.us-east-1.amazonaws.com/invideo-uploads-us-east-1/speechen-US-News-L17664032245720.mp3","message":"success","success":true}
curl --location --request POST 'https://zylalabs.com/api/11558/ultra+text-to-speech+api/21835/create+text-to-speech' --header 'Authorization: Bearer YOUR_API_KEY'
--data-raw '{
"gender": "FEMALE",
"language_code": "en-US",
"language_name": "English (US)",
"voice_id": "en-US-News-L",
"text": "Stand by... we have a major development coming into the newsroom right now. After weeks of uncertainty—and hours of intense speculation—the decision has finally been made. The result? It is absolutely not what anyone expected! Sources on the ground are describing the atmosphere as tense... yet strangely hopeful. We are working to confirm the details at this very moment, so please... do not go anywhere."
}'
| Header | Description |
|---|---|
Authorization
|
[Required] Should be Bearer access_key. See "Your API Access Key" above when you are subscribed. |
No long-term commitment. Upgrade, downgrade, or cancel anytime. Free Trial includes up to 50 requests.
The GET List of voices endpoint returns a list of available voice personas, including attributes like gender, language code, and voice type. The POST Create text-to-speech endpoint returns a URL link to the generated audio file along with a success message.
For the GET List of voices, key fields include "gender," "language_code," "language_name," "type," and "voice_id." For the POST Create text-to-speech, the key fields are "data" (audio URL), "message," and "success."
The POST Create text-to-speech endpoint accepts parameters such as the text to be converted and optional natural language prompts for voice modulation. Users can customize the delivery style and pacing through these prompts.
The response data for the GET List of voices is organized in a JSON format with an array of voice objects under the "data" key. The POST Create text-to-speech response includes a single object with "data," "message," and "success" keys.
Typical use cases include generating dynamic audio for interactive AI agents, automating audiobook narration, creating engaging educational content, and enhancing gaming experiences with realistic NPC dialogue.
Data accuracy is maintained through a combination of professional voice actor recordings and advanced AI algorithms that ensure high-quality voice synthesis. Continuous updates and user feedback also contribute to improving voice performance.
Users can utilize the returned audio URL from the POST Create text-to-speech response to play or store the generated audio. The voice attributes from the GET List of voices can help users select the most suitable voice for their application.
Users can expect structured JSON responses with clear success indicators. For the GET List of voices, the data will typically include multiple voice options, while the POST Create text-to-speech will return a single audio file link upon successful processing.
Users can customize their voice selection by utilizing the attributes returned in the GET List of voices. They can filter voices based on gender, language, and type to find the most suitable voice persona for their application.
The API supports audio output in MP3 format for the generated text-to-speech audio. This format is suitable for post-production and easy integration into various applications.
The API's Affective Intelligence feature dynamically adjusts the emotional weight of the speech based on the sentiment of the input text, allowing for a more engaging and contextually appropriate delivery.
The "data" field in the POST Create text-to-speech response contains the URL link to the generated audio file. Users can use this link to play or download the audio for their applications.
The Multi-Speaker "Dialogue" Engine allows the API to simulate conversations with distinct voices, maintaining unified context and tone, which is essential for creating realistic interactions in podcasts or customer service scenarios.
Natural language prompts enable users to control voice delivery style intuitively, allowing for creative expressions like "speak excitedly" or "read slowly." This flexibility enhances the audio's emotional impact and engagement.
The API offers deep support for regional accents, providing multiple variants for languages like English, Spanish, and French. This ensures that the generated speech resonates with local audiences and enhances relatability.
If users receive an empty response, they should check their input parameters for accuracy and completeness. Ensuring valid text and prompts can help avoid empty results and improve the likelihood of successful audio generation.
Please have a look at our Refund Policy: https://zylalabs.com/terms#refund
To obtain your API key, you first need to sign in to your account and subscribe to the API you want to use. Once subscribed, go to your Profile, open the Subscription section, and select the specific API. Your API key will be available there and can be used to authenticate your requests.
You can’t switch APIs during the free trial. If you subscribe to a different API, your trial will end and the new subscription will start as a paid plan.
If you don’t cancel before the 7th day, your free trial will end automatically and your subscription will switch to a paid plan under the same plan you originally subscribed to, meaning you will be charged and gain access to the API calls included in that plan.
The free trial ends when you reach 50 API requests or after 7 days, whichever comes first.
No, the free trial is available only once, so we recommend using it on the API that interests you the most. Most of our APIs offer a free trial, but some may not include this option.
Yes, we offer a 7-day free trial that allows you to make up to 50 API calls at no cost, so you can test our APIs without any commitment.
Zyla API Hub is like a big store for APIs, where you can find thousands of them all in one place. We also offer dedicated support and real-time monitoring of all APIs. Once you sign up, you can pick and choose which APIs you want to use. Just remember, each API needs its own subscription. But if you subscribe to multiple ones, you'll use the same key for all of them, making things easier for you.
Service Level:
100%
Response Time:
1,594ms
Service Level:
96%
Response Time:
735ms
Service Level:
100%
Response Time:
1,477ms
Service Level:
100%
Response Time:
4,645ms
Service Level:
100%
Response Time:
0ms
Service Level:
100%
Response Time:
0ms
Service Level:
100%
Response Time:
731ms
Service Level:
100%
Response Time:
0ms
Service Level:
94%
Response Time:
659ms
Service Level:
100%
Response Time:
888ms
Service Level:
100%
Response Time:
774ms
Service Level:
100%
Response Time:
21ms
Service Level:
100%
Response Time:
518ms
Service Level:
100%
Response Time:
4,625ms
Service Level:
100%
Response Time:
1,557ms
Service Level:
100%
Response Time:
2,016ms
Service Level:
100%
Response Time:
3,116ms
Service Level:
100%
Response Time:
400ms
Service Level:
100%
Response Time:
1,704ms
Service Level:
100%
Response Time:
1,932ms