Author Topic: Audio file to text functionality  (Read 2586 times)

Offline danielmiller

  • Member
  • **
  • Posts: 57
    • View Profile
Audio file to text functionality
« on: February 06, 2024, 11:02:30 AM »
I use the voice memo feature of my camera but find it tedious to manually transcribe them. Given the recent advancements in AI and open source projects like "Whisper," I'd like to request there be an option for Photo Mechanic to transcribe available audio memo files to text in the Caption box, in the Metadata Info window.

Offline Kevin M. Cox

  • Hero Member
  • *****
  • Posts: 544
  • PM 2024.10 (8173) | macOS 15.1
    • View Profile
    • Kevin M. Cox | Photojournalist
Re: Audio file to text functionality
« Reply #1 on: February 06, 2024, 07:05:09 PM »
This is a pretty interesting idea.
Kevin M. Cox | Photojournalist
https://www.instagram.com/kevin.m.cox/

Offline Kirk Baker

  • Senior Software Engineer
  • Camera Bits Staff
  • Superhero Member
  • *****
  • Posts: 25019
    • View Profile
    • Camera Bits, Inc.
Re: Audio file to text functionality
« Reply #2 on: February 06, 2024, 07:43:27 PM »
Daniel,

I use the voice memo feature of my camera but find it tedious to manually transcribe them. Given the recent advancements in AI and open source projects like "Whisper," I'd like to request there be an option for Photo Mechanic to transcribe available audio memo files to text in the Caption box, in the Metadata Info window.

I suggest taking some of your WAV files and running them through WhisperUI here: https://whisperui.com/

It's free.  Does it do a great job?

I tried it out a year ago on some English spoken word and it did a good job.  We tried it out on some other Chinese languages and it was poor.

The feature could have some appeal to some users, but language support may be spotty.

-Kirk

Offline danielmiller

  • Member
  • **
  • Posts: 57
    • View Profile
Re: Audio file to text functionality
« Reply #3 on: February 07, 2024, 11:17:12 AM »
Hi Kirk,

Yeah, it actually did a pretty decent job. Even on voices speaking English with a pretty strong accent.

It would be so useful to have the text appear in the Photo Mechanic caption box with a button.

-Dan

Offline Kevin M. Cox

  • Hero Member
  • *****
  • Posts: 544
  • PM 2024.10 (8173) | macOS 15.1
    • View Profile
    • Kevin M. Cox | Photojournalist
Re: Audio file to text functionality
« Reply #4 on: February 07, 2024, 07:46:49 PM »
Especially if you could speak in a pattern that would expand with code replacement...  8)

"h15 fields a ground ball hit by v44 in the eighth inning..."
Kevin M. Cox | Photojournalist
https://www.instagram.com/kevin.m.cox/