Using Python and Google to Transcribe Audio: A Step-by-Step Guide

There are many reasons to transcribe audio, including to obtain song lyrics, create close captioning, or keep a text file of a meeting. Manually transcribing text can be quite a chore, but luckily, with a little help from Python and Google Speech Recognition, we can do it quickly and easily.

Why Transcribe Audio?

It helps make audio content accessible to people who are hard of hearing.

It makes audio content searchable and indexable.

It provides written records of meetings, interviews, and lectures.

It saves time compared to manual transcription.

What is Google Speech Recognition?

Google Speech Recognition is a cloud-based service that converts spoken language into written text using advanced machine-learning models. It’s frequently updated, supports a wide range of languages and dialects, can process audio in real-time, and you can integrate it into your projects using the API.

Setting Up Your Project

Before we get started creating our audio transcription script, you will need to ensure that you have Python installed on your machine and that you know how to run the scripts. If you need help, we have several beginner programming projects here at GeekSided that will get you going.

We are going to look at each part of the code to explain how it works before we give you the complete code at the end.

To use Google Speech Recognition, you will need to install it in Python using this command at the command prompt, which will also install pydub, a library for manipulating audio:

pip install pydub SpeechRecognition

Python Script

Import Libraries

The first part of our script will import the libraries we need, which include the two we just installed. The OS library is pre-installed and will help us write our text in a file so we can store it.

import speech_recognition as sr
from pydub import AudioSegment
import os

Load and Convert Audio

Next, we will create a function to load and convert our file to a WAV format if it isn’t already in that format.

def transcribe_audio(file_name):
try:
# Load the audio file
audio = AudioSegment.from_file(file_name)
except Exception as e:
print(f"Error loading audio file: {e}")
return

try:
# Export to WAV format
wav_file_name = file_name.replace(file_name.split('.')[-1], 'wav')
audio.export(wav_file_name, format="wav")
except Exception as e:
print(f"Error converting to WAV: {e}")
return

Transcribe Audio

Now, we will add the code that analyses the audio, transcribes the words, and prints them to a file for safe storage.

try:
# Use speech_recognition to transcribe the audio
recognizer = sr.Recognizer()

with sr.AudioFile(wav_file_name) as source:
audio_data = recognizer.record(source)

transcription = recognizer.recognize_google(audio_data)
print(f"Transcription:\n{transcription}")

# Save transcription to a text file
text_file_name = file_name.replace(file_name.split('.')[-1], 'txt')
with open(text_file_name, 'w') as text_file:
text_file.write(transcription)

print(f"Transcription saved to {text_file_name}")

except sr.UnknownValueError:
print("Sorry, the audio was not clear enough to transcribe.")
except sr.RequestError as e:
print(f"Could not request results from Google Speech Recognition service; {e}")

User Input and Execution

Finally, we add the code that will ask us to input the name of the audio file to transcribe, which, for this project, needs to be in the same directory as the script.

if __name__ == "__main__":
file_name = input("Enter the audio file name (with extension): ")
if os.path.exists(file_name):
transcribe_audio(file_name)
else:
print(f"File {file_name} does not exist.")

input("Press Enter to exit...")

Complete Code