Using Pronunciation Dictionaries with ElevenLabs SDK

Pronunciation dictionaries are essential tools for managing how specific words are pronounced in text-to-speech applications. This tutorial will guide you through using the ElevenLabs Python SDK to create, modify, and utilize pronunciation dictionaries effectively.
Requirements
Before you begin, ensure you have the following:
- An ElevenLabs account with an API key.
- Python installed on your machine.
- FFMPEG to play audio.
Setup
Installing the SDK
To start, install the necessary SDKs and libraries. You will need the ElevenLabs SDK for updating pronunciation dictionaries and using text-to-speech conversion. Install it using pip:
pip install elevenlabsAdditionally, install python-dotenv to manage your environmental variables:
pip install python-dotenvCreate a .env file in your project directory and fill it with your credentials:
ELEVENLABS_API_KEY=your_elevenlabs_api_key_here
Initiate the Client SDK
Initialize the client SDK with the following code:
import os
from elevenlabs.client import ElevenLabs
ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY")
client = ElevenLabs(
api_key=ELEVENLABS_API_KEY,
)Creating a Pronunciation Dictionary
To create a pronunciation dictionary from a file, you need to create a .pls file for your rules. This file will use the "IPA" alphabet to update pronunciations. Save it as dictionary.pls.
<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0"
xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon
http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"
alphabet="ipa" xml:lang="en-US">
<lexeme>
<grapheme>tomato</grapheme>
<phoneme>/tə'meɪtoʊ/</phoneme>
</lexeme>
<lexeme>
<grapheme>Tomato</grapheme>
<phoneme>/tə'meɪtoʊ/</phoneme>
</lexeme>
</lexicon>Add rules from the file and generate text-to-speech audio to compare results:
import requests
from elevenlabs import play, PronunciationDictionaryVersionLocator
with open("dictionary.pls", "rb") as f:
pronunciation_dictionary = client.pronunciation_dictionary.add_from_file(
file=f.read(), name="example"
)
audio_1 = client.generate(
text="Without the dictionary: tomato",
voice="Rachel",
model="eleven_turbo_v2",
)
audio_2 = client.generate(
text="With the dictionary: tomato",
voice="Rachel",
model="eleven_turbo_v2",
pronunciation_dictionary_locators=[
PronunciationDictionaryVersionLocator(
pronunciation_dictionary_id=pronunciation_dictionary.id,
version_id=pronunciation_dictionary.version_id,
)
],
)
play(audio_1)
play(audio_2)Modifying Pronunciation Dictionaries
Removing Rules
To remove rules, use the remove_rules_from_the_pronunciation_dictionary method:
pronunciation_dictionary_rules_removed = (
client.pronunciation_dictionary.remove_rules_from_the_pronunciation_dictionary(
pronunciation_dictionary_id=pronunciation_dictionary.id,
rule_strings=["tomato", "Tomato"],
)
)
audio_3 = client.generate(
text="With the rule removed: tomato",
voice="Rachel",
model="eleven_turbo_v2",
pronunciation_dictionary_locators=[
PronunciationDictionaryVersionLocator(
pronunciation_dictionary_id=pronunciation_dictionary_rules_removed.id,
version_id=pronunciation_dictionary_rules_removed.version_id,
)
],
)
play(audio_3)Adding Rules
Add rules directly using the PronunciationDictionaryRule_Phoneme class:
from elevenlabs import PronunciationDictionaryRule_Phoneme
pronunciation_dictionary_rules_added = client.pronunciation_dictionary.add_rules_to_the_pronunciation_dictionary(
pronunciation_dictionary_id=pronunciation_dictionary_rules_removed.id,
rules=[
PronunciationDictionaryRule_Phoneme(
type="phoneme",
alphabet="ipa",
string_to_replace="tomato",
phoneme="/tə'meɪtoʊ/",
),
PronunciationDictionaryRule_Phoneme(
type="phoneme",
alphabet="ipa",
string_to_replace="Tomato",
phoneme="/tə'meɪtoʊ/",
),
],
)
audio_4 = client.generate(
text="With the rule added again: tomato",
voice="Rachel",
model="eleven_turbo_v2",
pronunciation_dictionary_locators=[
PronunciationDictionaryVersionLocator(
pronunciation_dictionary_id=pronunciation_dictionary_rules_added.id,
version_id=pronunciation_dictionary_rules_added.version_id,
)
],
)
play(audio_4)Conclusion
By following this guide, you can effectively manage pronunciation dictionaries to enhance text-to-speech applications. For more details, refer to the full project files.
Reference: This article is based on information from ElevenLabs. For more details, visit ElevenLabs. Author: ElevenLabs Team.
Discuss Your Project with Us
We're here to help with your web development needs. Schedule a call to discuss your project and how we can assist you.
Let's find the best solutions for your needs.
Related Articles

Build an AI Voice Agent That Joins Google Meet Calls (ElevenLabs + Selenium + PulseAudio)
Step-by-step tutorial to build a Python bot that joins Google Meet as a participant and speaks using ElevenLabs Conversational AI — with Selenium, Xvfb, and PulseAudio virtual audio routing on a cloud GPU server.

Getting Started with ALLaM-7B-Instruct-preview
Learn how to use the ALLaM-7B-Instruct-preview model with Python, and how to interact with it from JavaScript via a hosted API (e.g., on Hugging Face Spaces).

Building a Conversational AI App with Next.js
Learn how to build a web application that enables real-time voice conversations with AI agents using Next.js and ElevenLabs.