Speech-to-text and text-to-speech technologies allow computers to understand spoken input and respond using voice. These features are commonly used in AI assistants, automation systems and accessibility applications. Python provides ways to implement them using the SpeechRecognition and pyttsx3 libraries.
Installation
To use speech recognition and text-to-speech in Python, install the required libraries by running the following commands in your terminal or command prompt:
pip install SpeechRecognition pyaudio pyttsx3
- SpeechRecognition: Converts spoken audio into text using various recognition engines.
- PyAudio: Captures real-time audio from the microphone for speech processing.
- pyttsx3: Converts text into speech offline using a built-in voice engine.
Speech to Text (Using Microphone)
This Python program listens to your voice, converts it to text in real-time, and stops when you say “exit”. It also reduces background noise for better accuracy.
import speech_recognition as sr
r = sr.Recognizer()
while True:
try:
with sr.Microphone() as source:
print("Listening...")
r.adjust_for_ambient_noise(source, duration=0.2)
audio = r.listen(source)
text = r.recognize_google(audio)
text = text.lower()
print("You said:", text)
if "exit" in text:
print("Exiting program...")
break
except sr.RequestError as e:
print("Could not request results; {0}".format(e))
except sr.UnknownValueError:
print("Could not understand audio")
except KeyboardInterrupt:
print("Program terminated by user")
break
Output
Listening...
You said: hello how are you
Listening...
Could not understand audio
Listening...
You said: exit
Exiting program...
Explanation:
- sr.Recognizer(): Creates a speech recognizer.
- with sr.Microphone() as source: Opens the microphone for input.
- r.adjust_for_ambient_noise(source): Reduces background noise.
- audio = r.listen(source): Records speech from the user.
- text = r.recognize_google(audio): Converts speech to text.
- if "exit" in text: Stops the program when user says “exit”.
- Exception handling: Manages API errors, unrecognized speech, or manual interruption.
Text to Speech (Using pyttsx3)
This Python program converts speech to text and reads it aloud using the SpeechRecognition and pyttsx3 libraries, enabling offline text-to-speech functionality.
import pyttsx3
engine = pyttsx3.init()
text = input("Enter the text you want to convert to speech: ")
engine.say(text)
engine.runAndWait()
engine.save_to_file(text, "output_audio.mp3")
engine.runAndWait()
print("Text has been spoken and saved as 'output_audio.mp3'")
Output
Enter the text you want to convert to speech: Hey buddy, how are you?
Explanation:
- engine = pyttsx3.init(): Initializes the text-to-speech engine.
- engine.say(text): Queues the text to be spoken aloud.
- engine.runAndWait(): Plays the spoken text.
- engine.save_to_file(text, "output_audio.mp3"): Saves the spoken text as an audio file named output_audio.mp3.
- engine.runAndWait(): Processes the save command to write the audio file.