Author Topic: Vista 'Mangles' Metadata  (Read 18530 times)

Offline Giles

  • Newcomer
  • *
  • Posts: 24
    • View Profile
Vista 'Mangles' Metadata
« on: February 09, 2007, 07:01:57 AM »
Kirk,

This might all be moot, but I thought I'd provide a head's up.  I found this article concerning Microsoft's Vista handling of Metadata:

"Metadata mangling in Windows Vista"
http://news.com.com/2061-10805_3-6157801.html?part=rss&tag=2547-1_3-0-5&subj=news

If I understand correctly, as long as users aren't using Vista to actually caption any photos, there should be no corruption for any third party programs.  I'm thinking that this shouldn't be a problem since most of us would strictly be using PM for any photo sorting.  That being said, if anyone complains about corrupted metadata, at least we know the possible culprit.

Regards,
Giles

Offline dennis

  • President
  • Camera Bits Staff
  • Sr. Member
  • *****
  • Posts: 467
    • View Profile
    • Camera Bits, Inc.
Re: Vista 'Mangles' Metadata
« Reply #1 on: February 09, 2007, 02:35:07 PM »
Hi Giles,

I've looked into this problem regarding Nikon's codec and Microsoft's Photo Info (MPI) add-on and here is the scoop from examining a "corrupted" RAW NEF photo.  I first need to say that I haven't fully tested all options (e.g. XP with MPI, Vista with MPI, Vista with MPI + Nikon codec).

Readers should first read Microsoft's FAQ to understand what is officially going on here:

http://www.microsoft.com/windowsxp/using/digitalphotography/prophoto/photoinfofaq.mspx

MPI uses a more "intrusive" method to add metadata than Photo Mechanic.  It essentially rewrites the RAW file, whereas Photo Mechanic "patches" the RAW file by creating a new TIFF table (with new tags for IPTC/XMP) and appending this to the end-of-file, along with the IPTC/XMP data.  PM then points to this new TIFF table by changing the "IFD" pointer at offset 4 in the file.  The advantage with PM's method is that (other than the 4 bytes for the IFD pointer) the existing contents of the file remain completely untouched.  When combined with some proprietary technology I invented, this lets PM "undo" any changes made by PM so that you can get back to the original RAW file (note: this applies only to TIFF-based RAW files which covers most RAW formats other than, for example, CRW and RAF).

Why is it important to get back to the original RAW file?  Well, because not all software is smart enough to properly parse TIFF-based RAW files that have been modified (e.g. Mac OSX and therefore Aperture), even if the modifications are relatively minor.  And theoretically, a RAW file is proprietary to the manufacturer and so there is no guarantee that modifications made to the RAW file will sit well with the manufacturer's own software.  So far this hasn't been much of a problem with PM's method and, for example, Nikon Capture (and Capture NX) or with Photoshop.  But other software that makes assumptions about the RAW format can get tripped-up by even the most basic changes made to a RAW file (e.g. assumptions like the TIFF table is still at the front of the file).

Unfortunately, by rewritting the RAW files as done by MPI, there is no guaranteed way to "undo" the addition of metadata if some software were to have problems reading the modified RAW file.  Assuming there are no bugs with MPI or a codec employed by MPI, the modified RAW file should still be properly formatted and any errors reading the modified RAW file are because of improper parsing or assumptions about the RAW file's formatting.  The exception here is the insidious private "maker note" used by manufacturers to include info about the photo that isn't expressable by Exif (or to hide info).  If a maker note isn't properly "self-contained" such that it only references data inside itself and relative to itself (not the start of file), then moving the maker note in the file can break the maker note.

OK, now back to the RAW NEF file in question.  What apparently is happening is that Nikon's codec not only rewrites the file in order to insert XMP (not IPTC), but it changes the byte ordering (endian) from big endian (Motorola) to little endian (Intel).  I don't understand why it does this (since little endian sucks when it comes to examining a file in a hex editor), but technically there is nothing wrong with the byte order change.  However, the maker note is left unchaged as big endian.  Again, technically this is OK since Nikon's maker note is self-contained and internally specifies the byte ordering.

However, software that expects the endian of the maker note to be the same as the RAW TIFF wrapper (e.g. NEF) will obviously fail when trying to parse the maker note.  This is apparently the case with Photoshop and Mac OSX (and other software no doubt).  The file itself isn't corrupted in any way and is technically OK.  It is surprising that Photoshop has a problem with these files since the DNG specification allows for maker notes to have a different byte ordering (endian) than the DNG container.

Photo Mechanic 4.4.3.3 itself has a small problem with this endian change, but it detects the problem and simply skips the parsing of the maker note.  This means that you lose certain meta data (e.g. serial number), but PM will still show the photo OK.  I have updated 4.5 to be able to handle this maker note endian change so that it can properly parse the maker note.

Given the problems with various RAW converters and operating systems, the safest solution is to leave the RAW file untouched and only create an XMP sidecar file.  Photo Mechanic can be setup to operate this way.  But some people who use software (e.g. Nikon Capture) that can properly handle embedded IPTC data prefer to configure PM to embed metadata into the RAW files.  When encountering software that fails to read the modified RAW file, PM users can revert to the original file.  Unfortunately this isn't an option with MPI which will always rewrite your RAW file.  My advice: don't use Photo Info or software that relies upon the internal Windows Imaging Component to caption your RAW files until you have tested the modified RAW files with various RAW converters or browsers you use for compatibility.

It is really a shame that the state of RAW captioning continues to deteriorate rather than improve (outside of PM of course ;)).

--dennis

Offline roysmyth

  • Newcomer
  • *
  • Posts: 21
    • View Profile
Re: Vista 'Mangles' Metadata
« Reply #2 on: February 09, 2007, 09:48:55 PM »
Dennis,

Thank you for the detailed explanation.

What I have been doing with my NEF flies is to add metadata in PM and then make them read-only. My preferences in PM are set to embed IPTC (but not IPTC4XMP) and to always create an XMP sidecar. If I understand the way this works, the NEF file structure should be unchanged, but its embedded IPTC data should be updated. These NEF files should be compatible with anything that can read NEF. Spotlight on the Mac sees my keywords and captions when I do this. I don't use any Nikon software so I don't know how it works in the Nikon world.

I archive these read-only NEF files as my originals, then convert to DNG. The DNG files form my working image database.

There is a gap (or there will be when PM 4.5 is released) in that the additional information that the new IPTC XMP can hold will not be in the NEF file, but that's OK; I just want my caption and keywords to be preserved in the annotated NEF. The additional data will be in the sidecar file archived with the original NEF. The DNG will contain all the data and I can take the DNG  as a single file into Lightroom, Photoshop and IView without worrying about how they treat sidecars. (IView is not a treat for sidecars).

Have I got this right?

For me, the flexibility that PM offers in handling IPTC/XMP data is unique and keeps me out of trouble as I move my file through the many steps in the digital workflow we are forced to use today.

Offline dennis

  • President
  • Camera Bits Staff
  • Sr. Member
  • *****
  • Posts: 467
    • View Profile
    • Camera Bits, Inc.
Re: Vista 'Mangles' Metadata
« Reply #3 on: February 10, 2007, 12:39:11 PM »
Hi Roy,

Sounds like you have figured out a workflow that works for you, and that's what PM is all about.

NEF files straight out of the camera don't have IPTC embedded in them, so adding IPTC data to these in Photo Mechanic will change the NEF structure only as far as moving the TIFF table to the end-of-file (as I mentioned in the previous post).  The tools you mentioned should be able to handle embedded IPTC from Photo Mechanic.  But some other programs that use Nikon's "mini SDK" to get the "encrypted" white balance from some cameras (e.g. D2X) may fail to get the white balance correctly from NEF files that have been captioned by PM.  I know that Thomas Knoll was able to work around this problem for Camera Raw and I assume this "fix" carries forward into Lightroom (I recall at one point that Lightroom was NOT using this mini SDK so that means it wouldn't get tripped-up, but it also means it wouldn't read the "encrypted" white balance).

You are correct that some new fields in PM 4.5 are XMP only (e.g. contact info), so if you choose to only embed IPTC then any program that ignores the XMP sidecar file won't see this XMP-only info.  Actually, if you are using Adobe's products, it is best to NOT embed XMP into RAW files and force the use of a sidecar file (because Bridge for example doesn't update embedded XMP even though it reads this over a sidecar, causing some undesirable behavior).

When you convert to DNG with XMP-only data (e.g. contact info) only present in the sidecar file, it would be nice to be able to use the XMP sidecar over the embedded IPTC since the sidecar has everything.  I assume the conversion to DNG uses the embedded IPTC, right?

--dennis

Offline GaryVoth

  • Newcomer
  • *
  • Posts: 1
    • View Profile
Re: Vista 'Mangles' Metadata
« Reply #4 on: February 12, 2007, 03:31:03 PM »

MPI uses a more "intrusive" method to add metadata than Photo Mechanic.  It essentially rewrites the RAW file, whereas Photo Mechanic "patches" the RAW file by creating a new TIFF table (with new tags for IPTC/XMP) and appending this to the end-of-file, along with the IPTC/XMP data.  PM then points to this new TIFF table by changing the "IFD" pointer at offset 4 in the file. 
--dennis


Hi Dennis, I hope things are well...

If you would permit me one clarification of your explanation, Photo Info actually has two different modes of operation when tagging NEF files, depending on the presence or absence of an installed codec. 

If the Nikon codec is installed we defer to the codec for metadata read/write, with the results as described (and as you state, this is a validly-formed NEF file). However, if the Nikon codec is not installed, Photo Info has internal code that follows the same strategy as Photo Mechanic, appending the IPTC information to the end of the file. (We did significant interoperability testing with your product, in fact.)  So in the immediate future while the codec issues are being worked out, customers can use Photo Info and PM together without installing the NEF codec, should they choose to do so. 

While the current issue with the NEF codec is unfortunate, it is not untypical of the "teething pains" that any new approach can have, and we expect it will be short term...  The good news is that Nikon (and other camera manufacturers) in supporting Windows codecs is making it possible for Winodws applications to work with RAW images without requiring updates for each new camera model.

Best,

Gary Voth
Software Architect, Rich Media Group
Microsoft Corp.
« Last Edit: February 12, 2007, 09:53:21 PM by GaryVoth »

Offline dennis

  • President
  • Camera Bits Staff
  • Sr. Member
  • *****
  • Posts: 467
    • View Profile
    • Camera Bits, Inc.
Re: Vista 'Mangles' Metadata
« Reply #5 on: February 13, 2007, 02:07:35 PM »
If you would permit me one clarification of your explanation, Photo Info actually has two different modes of operation when tagging NEF files, depending on the presence or absence of an installed codec. 

If the Nikon codec is installed we defer to the codec for metadata read/write, with the results as described (and as you state, this is a validly-formed NEF file). However, if the Nikon codec is not installed, Photo Info has internal code that follows the same strategy as Photo Mechanic, appending the IPTC information to the end of the file. (We did significant interoperability testing with your product, in fact.)  So in the immediate future while the codec issues are being worked out, customers can use Photo Info and PM together without installing the NEF codec, should they choose to do so. 

Hi Gary,

It is clear that Nikon's codec is responsible for the file being rewritten in such a way that it breaks third party RAW converters.  Plus, I suspected that it was Nikon's codec since it was XMP data that was embedded not IPTC, and the FAQ says that XMP metadata isn't supported without a RAW Windows codec installed.  Is that correct?

I haven't (yet) examined a Photo Info modified RAW file other than the NEF file modified via the Nikon codec, so I cannot say how "intrusive" MPI really is on its own.  But from what I read in the FAQ it sounds like MPI will move things like the maker note if room needs to be made:

        The "Maker Note" tag is not deleted, but it may be relocated in the image stream.

Perhaps this is only for JPEGs?  Its not clear.

I guess the question is can one "undo" the MPI tagging of a TIFF-based RAW file (e.g. NEF)?  The answer appears to be "no" for the Nikon codec, but what about the built-in MPI codec for TIFF-based RAW files?  If metadata tagging cannot be undone or reverted, then MPI isn't using the same strategy as Photo Mechanic.  And are "regular" TIFF files treated differently than TIFF-based RAW files (sounds like it from the FAQ):

        If a compressed TIFF file must be reorganized to add metadata, the TIFF codec does not always preserve the original compression settings, causing the file size to increase.

That sounds pretty intrusive to me.

Do you know if there will be some type of utility created to "repair" files captioned by Nikon's codec (by "repair" I don't mean the file is bad, only that it could be rewritten again to be compatible with third-party RAW converters).

While the current issue with the NEF codec is unfortunate, it is not untypical of the "teething pains" that any new approach can have, and we expect it will be short term...  The good news is that Nikon (and other camera manufacturers) in supporting Windows codecs is making it possible for Winodws applications to work with RAW images without requiring updates for each new camera model.

I agree that this is a better approach than Mac OS X (especially when Apple has had so many problems parsing modified RAW files).  When a new camera ships with an updated codec (that hopefully supports all older models and behaves no worse than a previous codec), Windows applications using the Windows Imaging Component (WIC) will automatically gain access to the new files.  This should solve the RAW rendering problem, but the history of camera manufacturers embedding metadata has not been very promising (e.g metadata bugs still exist today in the latest version of Capture NX).  One scary thought would be that Exif decides to embed IPTC and XMP within the Exif block (this would truly be messy because it would practically require that the Exif block be recreated when editing metadata).

Although I agree it is convenient to have all metadata embedded, it seems that WIC should be made to handle XMP sidecar files.  Plus, there are currently problems with Adobe's products when embedded XMP is present in a RAW file.  For example, Bridge only updates the XMP sidecar file and not any embedded XMP (even though the embedded XMP takes read precendence over the sidecar), causing problems with preferences "sticking" in Bridge.  It is for this reason that some prefer to only use XMP sidecar files.

One other question, does MPI support UTF-8 encoded IPTC?

--dennis