Developers

XTTS-v2

Audio

XTTS-v2 (≈ 1.8 GB checkpoint, Coqui-Public-Model-License)

Multilingual voice-cloning TTS you can run on a single mid-range GPU.

Zero-shot cloning. Feed a 6-second voice clip and it mimics tone, accent, and emotion—then speaks in any of 17 languages (EN, ES, FR, DE, IT, PT, PL, TR, RU, NL, CS, AR, ZH-CN, JA, HU, KO, HI).
Cross-language & style transfer. Keep the same speaker timbre while switching languages or emotional style; supports multi-reference mixing for smoother prosody.
Hardware reality. Weights are ~1.8 GB on disk; inference peaks around 2-4 GB VRAM and <5 GB system RAM—under 10 GB even for book-length runs on a 3090.
24 kHz output, streaming OK. Latency ~200 ms for real-time chat; supports chunked streaming or batched long-form synthesis.
License heads-up. CPML allows commercial use but bans re-selling weight access; read the terms before wiring it into paid APIs.

Why pick it for Norman AI?

XTTS-v2 drops high-quality, cross-language speech synthesis into a sub-10 GB envelope—perfect for adding voice chat, multilingual dubbing, or branded voice avatars to our stack without new infra or complex data collection.

response = await norman.invoke(
    {
        "model_name": "xtts-v2",
        "inputs": [
            {
                "display_title": "Prompt",
                "data": "Create a groovy, rhythmic remix of the input audio."
            },
            {
                "display_title": "Prompt",
                "data": "/Users/alice/Desktop/sample_input.aac"
            }
        ]
    }
)

response = await norman.invoke(
    {
        "model_name": "xtts-v2",
        "inputs": [
            {
                "display_title": "Prompt",
                "data": "Create a groovy, rhythmic remix of the input audio."
            },
            {
                "display_title": "Prompt",
                "data": "/Users/alice/Desktop/sample_input.aac"
            }
        ]
    }
)

response = await norman.invoke(
    {
        "model_name": "xtts-v2",
        "inputs": [
            {
                "display_title": "Prompt",
                "data": "Create a groovy, rhythmic remix of the input audio."
            },
            {
                "display_title": "Prompt",
                "data": "/Users/alice/Desktop/sample_input.aac"
            }
        ]
    }
)

View Docs

‹ Whisper Large-v3

wav2vec2-base-960h ›

Home

Developers

Join Us

Contact

XTTS-v2

XTTS-v2 (≈ 1.8 GB checkpoint, Coqui-Public-Model-License)

Why pick it for Norman AI?