So we had to transcribe a bunch of interviews we did earlier. One of the sessions was a good 45 minute long and had a tonne of details. It’s quite difficult to manually transcribe this. But hey, we gotta do this. And then we have to do a bunch more later. The wife was in charge of doing this and she was trying to do manually. Then she had a “eureka” moment and tried to use One Note’s speech-to-text function. But there were tonnes of limitations – the laptop doesn’t hear itself and the phone seems to freeze for a long audio block.
I thought to myself – I study so much about ML and NLP in all my free time and here I am not even wondering how best to solve this problem. So, like the good dev that I am, I googled for solutions. Bingo! Office 365 does it for us. And it’s apparently quite good and comes with our subscription. So I pointed her to the right menu items and said – “Here, this should save you time and effort.”
If only life were that simple. Office 365 decides that, after 2o minutes of transcribing, it is done. And who is to contest this? So she’s back to me – dude, this thing threw up after 20 minutes. What else can we do? Simple – split the file into a few minute chunks each and that should work. Audacity is your friend. So she goes with that. Only, the file we have is an m4a file and Audacity needs another half a dozen libraries to get started. Well, frankly, at this point neither of us have the patience. It seems like transcribing it manually would be faster. So I take the file from her and start to download audacity on my mac. Well, *nixes sure do behave differently. Besides, “FFmpeg” is the master of all decoders right?
But then maybe there is a better way? So I searched google for audio libraries for python. And there really is one – pydub. And they have a brilliant set of examples to do just my job. Cool, innit? Here is the final code that got me running in a few minutes:
And voila, it was done. Of course, I needed to have ffmpeg installed in my machine – which btw, is highly recommended!