Photo Mechanic > General Discussion

PhotoMechanic altering metadata/MD5 hash?

(1/2) > >>

big0mike:
My main (second) job is covering marathons & Ironman races. I've just started using PhotoMechanic to ingest the 50k to 100k images my team shoots each weekend. I got this email from our tech guys in Germany:

"It turns out that there is actually a problem with PhotoMechanic: It alters photo on a binary level - they look the same, they have the same file size, but it turns out it does funny stuff with the embedded metadata. This means that PhotoMechanic defeats our duplication detection mechanism, that relies on the MD5 hash of the file. This means that it's possible to upload the same photo twice, which happened last weekend. This has some potential to cause real havoc."

What I do in PM is add some basic info to the IPTC fields, create a directory based on data on the stationery pad, rename the files as they copy to include the photographer's name and the CF/SD card the image came from.

I know it's not much to go on and I'm hoping for a better explanation from the tech guys but I'm wondering if you have any thoughts on this? PhotoMechanic makes the ingest process so much easier than dueling Explorer windows and creating directories by hand...

Thanks,

Mike

Kirk Baker:
Mike,


--- Quote from: big0mike on September 20, 2018, 08:03:55 AM ---My main (second) job is covering marathons & Ironman races. I've just started using PhotoMechanic to ingest the 50k to 100k images my team shoots each weekend. I got this email from our tech guys in Germany:

"It turns out that there is actually a problem with PhotoMechanic: It alters photo on a binary level - they look the same, they have the same file size, but it turns out it does funny stuff with the embedded metadata. This means that PhotoMechanic defeats our duplication detection mechanism, that relies on the MD5 hash of the file. This means that it's possible to upload the same photo twice, which happened last weekend. This has some potential to cause real havoc."

What I do in PM is add some basic info to the IPTC fields, create a directory based on data on the stationery pad, rename the files as they copy to include the photographer's name and the CF/SD card the image came from.

--- End quote ---

Then you're modifying the files.  The metadata (IPTC) is stored in the images unless the image is a RAW file and you're using XMP sidecar files.  PM cannot add metadata to the files without modifying them.

Renaming shouldn't modify the file data (MD5 hash will be the same), but that's about all you can do to any file (change filesystem data like modification time, creation time, filename, permissions) without changing the file data itself.

-Kirk

big0mike:
Yeah, I realize I'm modifying them by adding IPTC fields but I don't know what the MD5 hash is or what that has to do it...

So, what you are saying is that if I were to NOT add the IPTC fields and just use PM to create directories and copy files this MD5 hash should remain unchanged and whatever problem they are encountering would be solved?

I'll see if I can setup a test event to check.

Mike

Kirk Baker:
Mike,


--- Quote from: big0mike on September 20, 2018, 10:01:42 AM ---Yeah, I realize I'm modifying them by adding IPTC fields but I don't know what the MD5 hash is or what that has to do it...

So, what you are saying is that if I were to NOT add the IPTC fields and just use PM to create directories and copy files this MD5 hash should remain unchanged and whatever problem they are encountering would be solved?
--- End quote ---

An MD5 hash is generated by a program that reads each byte of the file and performs the MD5 operation on them, accumulating a checksum.  The program outputs this data and your tech guys use it to detect changes to the file.  Changing one byte inside the file to a different value will cause the MD5 hash to change.

Basically you need to make your changes with PM, then generate the MD5 hash, and then never modify the file again with PM (or any other program) and then the MD5 should no longer change.  If your system is generating the MD5 hash before PM makes changes (or any other metadata-savvy app does) then the hash will change.

I'd really need to have a diagram of your system in order to help you solve this problem.

-Kirk

big0mike:

--- Quote from: Kirk Baker on September 20, 2018, 10:15:22 AM ---Basically you need to make your changes with PM, then generate the MD5 hash, and then never modify the file again with PM (or any other program) and then the MD5 should no longer change.  If your system is generating the MD5 hash before PM makes changes (or any other metadata-savvy app does) then the hash will change.

I'd really need to have a diagram of your system in order to help you solve this problem.
--- End quote ---
I THINK I'm making all the changes with PM. I use PM to ingest, create directories, rename files, and once they are all ingested they do not get touched. After all the files are uploaded I organize the directories, move them to the "import" directory on the laptop, and import them into their proprietary processing tool where they are scaled to a common size, compressed, and uploaded to their file servers.

I'm just guessing but I think the duplicates may have been me reingesting a card and using a different set of IPTC data which would mean a different directory created, filename, and IPTC data which, I assume, would provide a different MD5 hash that they same image already ingested.

I just explained to them what was happening and if they have time to waste humoring this one small cog in the system I'll setup a test event and do it without IPTC data being added to the images, and, just to be safe, without renaming the files.

Mike

Navigation

[0] Message Index

[#] Next page

Go to full version