Documentation Index
Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt
Use this file to discover all available pages before exploring further.
Overview
NVIDIA Nemotron Speech provides three TTS service implementations:NvidiaTTSService— High-quality TTS via NVIDIA’s cloud-based gRPC API with multilingual support, configurable quality settings, and cross-sentence audio stitching.NvidiaSageMakerHTTPTTSService— Single HTTP invocation to an AWS SageMaker endpoint, streaming raw PCM audio back for each text segment.NvidiaSageMakerTTSService— Persistent HTTP/2 bidi-stream to an AWS SageMaker endpoint with full interruption support viaInterruptibleTTSService.
NVIDIA Nemotron Speech TTS API Reference
Pipecat’s API methods for NVIDIA Nemotron Speech TTS integration
Example Implementation
Complete example with Nemotron Speech NIM
NVIDIA TTS NIM Documentation
Official NVIDIA TTS NIM documentation
NVIDIA Developer Portal
Access API keys and Nemotron Speech services
Installation
To use NVIDIA Nemotron Speech services, install the required dependencies:Prerequisites
NVIDIA Nemotron Speech Setup
Before using Nemotron Speech TTS services, you need:- NVIDIA Developer Account: Sign up at NVIDIA Developer Portal
- API Key: Generate an NVIDIA API key for Nemotron Speech services (required for cloud endpoint)
- Nemotron Speech Access: Ensure access to NVIDIA Nemotron Speech TTS services
Required Environment Variables
NVIDIA_API_KEY: Your NVIDIA API key for authentication (required for cloud endpoint, not needed for local deployments)
Configuration
NvidiaTTSService
NVIDIA API key for authentication. Required when using the cloud endpoint. Not
needed for local deployments.
gRPC server endpoint. Defaults to NVIDIA’s cloud endpoint. For local
deployments, pass the local address (e.g.
localhost:50051).Voice model identifier.Deprecated in v0.0.105. Use
settings=NvidiaTTSService.Settings(...) instead.Audio sample rate in Hz. When
None, uses the pipeline’s configured sample
rate.Dictionary containing
function_id and model_name for the TTS model.Whether to use SSL for the gRPC connection. Defaults to True for the NVIDIA
cloud endpoint. Set to False for local deployments.
Custom pronunciation dictionary mapping words (graphemes) to IPA phonetic
representations (phonemes), e.g.
{"NVIDIA": "ɛn.vɪ.diː.ʌ"}. See NVIDIA TTS
NIM phoneme
support
for the list of supported IPA phonemes.Output audio encoding format. Defaults to
AudioEncoding.LINEAR_PCM.Runtime-configurable synthesis settings. See InputParams
below.Deprecated in v0.0.105. Use
settings=NvidiaTTSService.Settings(...) instead.Settings
Runtime-configurable settings passed via thesettings constructor argument using NvidiaTTSService.Settings(...). These can be updated mid-conversation with TTSUpdateSettingsFrame. See Service Settings for details.
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | None | Model identifier. (Inherited.) |
voice | str | None | Voice identifier. (Inherited.) |
language | Language | str | None | Language for synthesis. (Inherited.) |
quality | int | NOT_GIVEN | Audio quality setting. |
Usage
Basic Setup
With Custom Voice and Quality
Notes
- gRPC-based: NVIDIA Nemotron Speech uses gRPC (not HTTP or WebSocket) for communication with the TTS service.
- Cross-sentence stitching: Multiple sentences within an LLM turn are fed into a single
SynthesizeOnlinegRPC stream for seamless audio across sentence boundaries (requires Magpie TTS model v1.7.0+). - Runtime settings updates: Voice, language, and quality can be updated mid-conversation with
TTSUpdateSettingsFrame. New values take effect on the next synthesis turn, not for the current turn’s in-flight requests. - Model cannot be changed after initialization: The model and function ID must be set during construction via
model_function_map. Callingset_model()after initialization will log a warning and have no effect. - SSL enabled by default: The service connects to NVIDIA’s cloud endpoint with SSL. Set
use_ssl=Falseonly for local or custom Nemotron Speech deployments. - Metrics generation: This service supports metric generation via
can_generate_metrics(). Metrics are automatically stopped when an audio context is interrupted.
NvidiaSageMakerHTTPTTSService
NVIDIA Magpie TTS service that calls a SageMaker HTTP endpoint for each text segment. Sends JSON to the endpoint’s/invocations path and streams raw PCM audio back.
Configuration
Name of the deployed SageMaker endpoint.
AWS region where the endpoint is deployed.
Audio sample rate in Hz. When
None, uses the pipeline’s configured sample
rate.Runtime-configurable settings. See Settings below.
Settings
Runtime-configurable settings passed via thesettings constructor argument using NvidiaSageMakerHTTPTTSService.Settings(...). These can be updated mid-conversation with TTSUpdateSettingsFrame. See Service Settings for details.
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | magpie | Model identifier. (Inherited.) |
voice | str | Magpie-Multilingual.EN-US.Aria | Voice identifier. (Inherited.) |
language | Language | str | en-US | BCP-47 language code for synthesis. |
Usage
Notes
- AWS SageMaker deployment required: This service requires a deployed SageMaker endpoint running NVIDIA Magpie TTS NIM. See the deployment example for setup instructions.
- AWS credentials: Requires
AWS_ACCESS_KEY_IDandAWS_SECRET_ACCESS_KEYenvironment variables for SageMaker authentication. - Environment variables:
SAGEMAKER_MAGPIE_ENDPOINT_NAMEfor the endpoint name. - HTTP-based: Each text segment triggers a new HTTP POST request to the SageMaker endpoint.
- Metrics support: This service supports metrics generation (
can_generate_metrics()returnsTrue).
NvidiaSageMakerTTSService
NVIDIA Magpie TTS service using SageMaker bidirectional streaming. Maintains a persistent HTTP/2 bidi-stream connection for the lifetime of the pipeline with full interruption support.Configuration
Name of the deployed SageMaker endpoint.
AWS region where the endpoint is deployed.
Audio sample rate in Hz. When
None, uses the pipeline’s configured sample
rate.Runtime-configurable settings. See Settings below.
Settings
Runtime-configurable settings passed via thesettings constructor argument using NvidiaSageMakerTTSService.Settings(...). These can be updated mid-conversation with TTSUpdateSettingsFrame. See Service Settings for details.
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | magpie | Model identifier. (Inherited.) |
voice | str | Magpie-Multilingual.EN-US.Aria | Voice identifier. (Inherited.) |
language | Language | str | en-US | BCP-47 language code for synthesis. |
Usage
Notes
- AWS SageMaker deployment required: This service requires a deployed SageMaker endpoint running NVIDIA Magpie TTS NIM. See the deployment example for setup instructions.
- AWS credentials: Requires
AWS_ACCESS_KEY_IDandAWS_SECRET_ACCESS_KEYenvironment variables for SageMaker authentication. - Environment variables:
SAGEMAKER_MAGPIE_ENDPOINT_NAMEfor the endpoint name. - Persistent connection: Maintains a single HTTP/2 bidi-stream session for the pipeline’s lifetime, reconnecting automatically on error.
- Interruption support: Extends
InterruptibleTTSServicefor proper handling of user interruptions. - Metrics support: This service supports metrics generation (
can_generate_metrics()returnsTrue).