Photo Mechanic > Feature Requests

Audio file to text functionality

(1/1)

danielmiller:
I use the voice memo feature of my camera but find it tedious to manually transcribe them. Given the recent advancements in AI and open source projects like "Whisper," I'd like to request there be an option for Photo Mechanic to transcribe available audio memo files to text in the Caption box, in the Metadata Info window.

Kevin M. Cox:
This is a pretty interesting idea.

Kirk Baker:
Daniel,


--- Quote from: danielmiller on February 06, 2024, 11:02:30 AM ---I use the voice memo feature of my camera but find it tedious to manually transcribe them. Given the recent advancements in AI and open source projects like "Whisper," I'd like to request there be an option for Photo Mechanic to transcribe available audio memo files to text in the Caption box, in the Metadata Info window.

--- End quote ---

I suggest taking some of your WAV files and running them through WhisperUI here: https://whisperui.com/

It's free.  Does it do a great job?

I tried it out a year ago on some English spoken word and it did a good job.  We tried it out on some other Chinese languages and it was poor.

The feature could have some appeal to some users, but language support may be spotty.

-Kirk

danielmiller:
Hi Kirk,

Yeah, it actually did a pretty decent job. Even on voices speaking English with a pretty strong accent.

It would be so useful to have the text appear in the Photo Mechanic caption box with a button.

-Dan

Kevin M. Cox:
Especially if you could speak in a pattern that would expand with code replacement...  8)

"h15 fields a ground ball hit by v44 in the eighth inning..."

Navigation

[0] Message Index

Go to full version