Welcome to Camera Bits Forums
I would love to see the following feature in an upcoming version of PM. I believe the speech to text technology (AI models, even available in open source form) has become sufficiently mature for such a feature to be implemented efficiently and effectively.The user defines the language model to be used for translating spoken words into written text (Preferences).The user enters the variable {wav} (or any other naming) in any of the metadata fields.If a .WAV file exists for a given photo, the variable {wav} is replaced by the text converted from the .WAV file (using for unrecognisable words) when the code/variables are evaluated.Use caseI select "English" as the language model to use to translate speech to text in the "Preferences" ahead of the game.I take a picture during the basketball game between France and Germany and voice tag it with a note "Peter Pan of team France scores".When ingesting the photos I set the caption in the "Metadata (IPTC) Template" to "{wav} during the basketball game between France and Germany on May 2, 2025 in Paris, France".When evaluating the codes/variables {wav} is replaced by the speech to text converted string "Peter Pan of team France scores".The final caption now reads, without having to type any text, "Peter Pan of team France scores during the basketball game between France and Germany on May 2, 2025 in Paris, France".
This would be awesome, but as Kirk said, in some of these environments I can barely hear/understand my own audio notes!