Author Topic: PhotoMechanic altering metadata/MD5 hash?  (Read 9018 times)

Offline big0mike

  • Newcomer
  • *
  • Posts: 25
    • View Profile
PhotoMechanic altering metadata/MD5 hash?
« on: September 20, 2018, 08:03:55 AM »
My main (second) job is covering marathons & Ironman races. I've just started using PhotoMechanic to ingest the 50k to 100k images my team shoots each weekend. I got this email from our tech guys in Germany:

"It turns out that there is actually a problem with PhotoMechanic: It alters photo on a binary level - they look the same, they have the same file size, but it turns out it does funny stuff with the embedded metadata. This means that PhotoMechanic defeats our duplication detection mechanism, that relies on the MD5 hash of the file. This means that it's possible to upload the same photo twice, which happened last weekend. This has some potential to cause real havoc."

What I do in PM is add some basic info to the IPTC fields, create a directory based on data on the stationery pad, rename the files as they copy to include the photographer's name and the CF/SD card the image came from.

I know it's not much to go on and I'm hoping for a better explanation from the tech guys but I'm wondering if you have any thoughts on this? PhotoMechanic makes the ingest process so much easier than dueling Explorer windows and creating directories by hand...

Thanks,

Mike

Offline Kirk Baker

  • Senior Software Engineer
  • Camera Bits Staff
  • Superhero Member
  • *****
  • Posts: 25020
    • View Profile
    • Camera Bits, Inc.
Re: PhotoMechanic altering metadata/MD5 hash?
« Reply #1 on: September 20, 2018, 09:32:21 AM »
Mike,

My main (second) job is covering marathons & Ironman races. I've just started using PhotoMechanic to ingest the 50k to 100k images my team shoots each weekend. I got this email from our tech guys in Germany:

"It turns out that there is actually a problem with PhotoMechanic: It alters photo on a binary level - they look the same, they have the same file size, but it turns out it does funny stuff with the embedded metadata. This means that PhotoMechanic defeats our duplication detection mechanism, that relies on the MD5 hash of the file. This means that it's possible to upload the same photo twice, which happened last weekend. This has some potential to cause real havoc."

What I do in PM is add some basic info to the IPTC fields, create a directory based on data on the stationery pad, rename the files as they copy to include the photographer's name and the CF/SD card the image came from.

Then you're modifying the files.  The metadata (IPTC) is stored in the images unless the image is a RAW file and you're using XMP sidecar files.  PM cannot add metadata to the files without modifying them.

Renaming shouldn't modify the file data (MD5 hash will be the same), but that's about all you can do to any file (change filesystem data like modification time, creation time, filename, permissions) without changing the file data itself.

-Kirk

Offline big0mike

  • Newcomer
  • *
  • Posts: 25
    • View Profile
Re: PhotoMechanic altering metadata/MD5 hash?
« Reply #2 on: September 20, 2018, 10:01:42 AM »
Yeah, I realize I'm modifying them by adding IPTC fields but I don't know what the MD5 hash is or what that has to do it...

So, what you are saying is that if I were to NOT add the IPTC fields and just use PM to create directories and copy files this MD5 hash should remain unchanged and whatever problem they are encountering would be solved?

I'll see if I can setup a test event to check.

Mike

Offline Kirk Baker

  • Senior Software Engineer
  • Camera Bits Staff
  • Superhero Member
  • *****
  • Posts: 25020
    • View Profile
    • Camera Bits, Inc.
Re: PhotoMechanic altering metadata/MD5 hash?
« Reply #3 on: September 20, 2018, 10:15:22 AM »
Mike,

Yeah, I realize I'm modifying them by adding IPTC fields but I don't know what the MD5 hash is or what that has to do it...

So, what you are saying is that if I were to NOT add the IPTC fields and just use PM to create directories and copy files this MD5 hash should remain unchanged and whatever problem they are encountering would be solved?

An MD5 hash is generated by a program that reads each byte of the file and performs the MD5 operation on them, accumulating a checksum.  The program outputs this data and your tech guys use it to detect changes to the file.  Changing one byte inside the file to a different value will cause the MD5 hash to change.

Basically you need to make your changes with PM, then generate the MD5 hash, and then never modify the file again with PM (or any other program) and then the MD5 should no longer change.  If your system is generating the MD5 hash before PM makes changes (or any other metadata-savvy app does) then the hash will change.

I'd really need to have a diagram of your system in order to help you solve this problem.

-Kirk

Offline big0mike

  • Newcomer
  • *
  • Posts: 25
    • View Profile
Re: PhotoMechanic altering metadata/MD5 hash?
« Reply #4 on: September 20, 2018, 10:32:02 AM »
Basically you need to make your changes with PM, then generate the MD5 hash, and then never modify the file again with PM (or any other program) and then the MD5 should no longer change.  If your system is generating the MD5 hash before PM makes changes (or any other metadata-savvy app does) then the hash will change.

I'd really need to have a diagram of your system in order to help you solve this problem.
I THINK I'm making all the changes with PM. I use PM to ingest, create directories, rename files, and once they are all ingested they do not get touched. After all the files are uploaded I organize the directories, move them to the "import" directory on the laptop, and import them into their proprietary processing tool where they are scaled to a common size, compressed, and uploaded to their file servers.

I'm just guessing but I think the duplicates may have been me reingesting a card and using a different set of IPTC data which would mean a different directory created, filename, and IPTC data which, I assume, would provide a different MD5 hash that they same image already ingested.

I just explained to them what was happening and if they have time to waste humoring this one small cog in the system I'll setup a test event and do it without IPTC data being added to the images, and, just to be safe, without renaming the files.

Mike

Offline big0mike

  • Newcomer
  • *
  • Posts: 25
    • View Profile
Re: PhotoMechanic altering metadata/MD5 hash?
« Reply #5 on: September 21, 2018, 02:09:10 PM »
I'm trying to make it easy on the tech guys buy ingesting a card of images several times so they can check the MD5 Hash.
  • 1 is a straight copy from the card without PM
  • 2 is a copy with PM creating the directory but NOT adding IPTC data into the photo
  • 3 is a copy with PM creating the directory and renaming the file but NOT adding IPTC data
  • 4 is a copy with PM creating the directory, renaming file, adding IPTC data

My problem, as you can probably guess, seems to be that if I do NOT "Apply IPTC Stationery Pad to Photos" the fields that I have setup in the stationery pad are not used to create the directory name.

Is there a way for the ingest dialog to use that data for directory names without inserting the same data into the photo?

Mike

Offline Odd Skjaeveland

  • Full Member
  • ***
  • Posts: 188
    • View Profile
Re: PhotoMechanic altering metadata/MD5 hash?
« Reply #6 on: September 21, 2018, 11:23:26 PM »
I'm trying to make it easy on the tech guys buy ingesting a card of images several times so they can check the MD5 Hash.

At what point/step is the hash created in your end?

 
--
Odd S.

Offline big0mike

  • Newcomer
  • *
  • Posts: 25
    • View Profile
Re: PhotoMechanic altering metadata/MD5 hash?
« Reply #7 on: September 22, 2018, 06:47:24 AM »
I have no idea. My assumption after speaking with Kirk is the hash is made when the file is written to my hard drive.

Offline Odd Skjaeveland

  • Full Member
  • ***
  • Posts: 188
    • View Profile
Re: PhotoMechanic altering metadata/MD5 hash?
« Reply #8 on: September 23, 2018, 02:41:13 AM »
My assumption after speaking with Kirk is the hash is made when the file is written to my hard drive.

That would mean an application on your computer, right?

DNG files are known to have an embedded MD5 hash (a DNG may actually have more than one hash for different purposes).

I am not aware of PM embedding MD5, but I think Adobe's DNG converter does so.

A camera that saves DNG files to the memory card, likely calculates and embeds a MD5 hash. But that is outside your computer.

I used to believe the MD5 was calculated from the DNG "pixel data" not including the IPTC. I obviously need to read the DNG specification again  :)
--
Odd S.

Offline uberfarben

  • Newcomer
  • *
  • Posts: 5
    • View Profile
Re: PhotoMechanic altering metadata/MD5 hash?
« Reply #9 on: September 24, 2018, 03:57:26 AM »
I have no idea. My assumption after speaking with Kirk is the hash is made when the file is written to my hard drive.

I'm assuming the digital asset management system used by the Germany office uses checksums to detect duplicates and the checksums are calculated internally on ingest. If you use PhotoMechanic to download and edit metadata before uploading the files to the DAM then the checksum duplicate detection will most likely always fail.

Some DAMs can be configured to detect duplication using filenames instead to get around this issue. But this method can also have failures. The simplest solution is to improve your workflow by making sure you don't upload the same file twice to the DAM.