Author Topic: Using catalogue and checksumming to battle "bit rot" and maintain file integrity  (Read 6745 times)

Offline mbrakes

  • Newcomer
  • *
  • Posts: 38
    • View Profile
I have discovered some damaged photos in my photo archive, googled around, and found that the cause of this is probably some variant of "bit-rot". This scares me, so I have begun research on how to combat this. Simple backup-strategies don't seem like they are enough, as the "flipped bits" will easily get copied along with other file changes, thereby overwriting the good backup copy. The old rule about backing up to different media is no longer practical or cost-effective, as optical disks are dead, and tape drives are unpractical and too expensive for most independent photographers like myself. I rely solely on RAID arrays and online backup services like Backblaze and Crashplan. I have to make sure my first backup-set stays intact, otherwise I'm in danger of having flipped bits transferred into my backups.

First, I was considering switching to the ZFS file system, but this seems overly complicated. I am still considering buying an Infortrend EONNAS NAS, but I already have two Drobos with a bunch of disks that I'd like to put to use, and scaling a ZFS NAS seems like a lot of hassle, so I'm hesitant. I keep thinking there must be ways to do this without having to go through the drastic steps of changing storage solutions and/or file systems.

I have found articles like this one, which suggests using checksums to verify and even repair corrupted files:

http://clusterbuffer.wordpress.com/file-system-tools/checksum/

The problem with the above is that it seems aimed at sysadmins comfortable using command-line tools. I'm not one of them, and I assume most photographers like me prefer to spend their time behind the camera rather than learning Terminal commands.

I also found this article on the controlled vocabulary site, which even has some useful software suggestions:
http://www.controlledvocabulary.com/imagedatabases/file-verification.html

The problem here is that the most promising software; Checksum+, seems like it is no longer maintained, and I can't get it to work under Mac OS X 10.8. Also, it doesn't seem like it has any options for restoring data using the checksum, or deduping, which would also be nice to have:

http://www.controlledvocabulary.com/imagedatabases/de-dupe.html

So; it seems to me like checksumming, deduping and restoring files would be a killer feature for the upcoming catalogue version of PhotoMechanic.

Something similar has already been put in place for Adobe Lightroom 5, according to Peter Krogh:
http://thedambook.com/dng-verification-in-lightroom-5/

The problem with the above is, of course, that it only works with DNG files and with Lightroom.

Time is money, and I would rather not have to use DNGs and Lightroom because both converting to DNGs and using Lightroom is too slow to my liking, even on a maxed out MacPro using SSD's. I also prefer to adjust my RAW files in CaptureOne and would like to keep my RAW files open to cross-software workflows and not be locked into Adobe's universe. Previous experimentation with DNG's suggests that converted DNGs don't always play well with other software.

Another approach I have also tried is to use ImageVerifier, but this fell into the same category as Checksum+ - buggy under OS X 10.8. I could not make it work.

So, after a lot of tinkering, this is my plea:

Please make this a feature under the new PhotoMechanic Catalogue software!

I imagine that most photographers are very concerned about the integrity of their photo archives, so it seems like a natural feature to offer for a photo cataloguing program - a fact proved by the fact that the Adobe people have opted to include this feature in Lightroom.

It's just the way they have opted to link it to DNGs exclusively which seems wrong. I would much rather keep my RAW files intact, and simply store the checksums as sidecar files and/or in a separate database - and I would much rather use this feature in conjunction with my favorite photo browser/tagging application; that is of course PhotoMechanic :-)

It also seems like a half-baked solution without the option to restore corrupted files - what is the use of checking data integrity after all if you can't use the checksums to restore the affected files?

I understand that your top priority right is to release the much-awaited catalogue software, but hope you will add this to your to-do list as soon as v.1 is released :-)

Offline Luiz Muzzi

  • Hero Member
  • *****
  • Posts: 704
    • View Profile
    • Luiz Muzzi Photography
It's just the way they have opted to link it to DNGs exclusively which seems wrong. I would much rather keep my RAW files intact, and simply store the checksums as sidecar files and/or in a separate database - and I would much rather use this feature in conjunction with my favorite photo browser/tagging application; that is of course PhotoMechanic :-)

Another vote for that as I do not use DNG's either and like to keep my CR2 files intact.
Regards,

-Luiz Muzzi

Offline david_hill

  • Newcomer
  • *
  • Posts: 42
    • View Profile
I ditto this request. This is a huge archival issue for which the current solutions leave something to be desired. Currently Lloyd Chambers offers an app to do the checksum piece, but it's extremely power user oriented, and integrating it into a workflow is another task, i.e. once you have your report of bad files found, how do you automate the replacement from a known good copy?

Offline pdizzle

  • Newcomer
  • *
  • Posts: 25
    • View Profile
This would be an amazing feature to add, especially for those of us who shoot in RAW exclusively.  RAW files typically don't get touched, so if Photo Mechanic were able to MD5 the files as they are imported and that data to the sidecar file, that would be awesome.  I saw the DNG's have something like that built in, but then you have to convert all your files to DNG.  I think doing a checksum and storing that value in the sidecar would be faster than a conversion.

Offline pdizzle

  • Newcomer
  • *
  • Posts: 25
    • View Profile
I think I found a relatively easy way for PM to implement something along these lines.

1.) Add a checksum field to the XMP file

2.) Add a function in PM that when you open a contact sheet of RAW files, that it can run MD5 checksums on all of them and write that data to the XMP sidecar.

2.5) It could even be done on import if the user chooses that option.  A side bonus of doing it on import is that you could verify that data is being copied off the memory card correctly.  I found another thread where someone was requesting file verification on import.

3.) Add a function in PM that when you open a contact sheet of RAW files, that it can run MD5 checksums on all of them and verify it against the checksum stored in the XMP sidecar.  If any mismatches are found, it can notify the user in someway.  Maybe by a color label or copying them to a subfolder.

If corrupted files are found, it leaves the restoring of files to the end user.  Hopefully the end user has a good backup system.  But a feature like this is useless without a backup system.

Hopefully Kirk Baker or someone else from the PM team can chime in with their opinion.  This seems relatively simple to implement in my mind.