Author Topic: Closed: Unicode IPTC support not working properly  (Read 3979 times)

Offline Hayo Baan

  • Uber Member
  • ******
  • Posts: 2552
  • Professional Photographer & Software Developer
    • View Profile
    • Hayo Baan - Photography
Closed: Unicode IPTC support not working properly
« on: October 20, 2007, 09:25:58 AM »
There's some strange behaviour with respect to unicode characters in IPTC/XMP fields.

Windows XP, PM version 4.5.3 beta 1015
XMP/IPTC preferences:
  • Order: "Sidecar", "XMP", "IPTC"
  • Both XMP and IPTC are always added
  • Default encoding: "Windows Latin1+Euro", not written as Unicode

First of all the CodedCharacterSet IPTC field always gets the "ESC%G" value indicating Unicode encoding, even though I have not specified this.  This behaviour is new with the 4.5.3 beta I'm now using.  I quite sure this wasn't the case with 4.5.2 (certainly not with 4.5.1), so this is a newly introduced bug.

The second problem is more complex, but basically it comes to this: the IPTC information "forgets" it has special characters...  Let me try to describe this in a reproducible way.
1. In the IPTC dialogue, enter some special Unicode characters like ĀāĂăĠġĦħĨĩ in the Caption field.
2. Save (OK on the dialogue)
3. Open the image information again and voila, your characters have magically changed into AaAaGgHhIi. :o
Ah, remember I hadn't specified unicode encoding, so this may have been the cause (but then again I have XMP embedded as well, which takes precedence and always is in unicode).  ???

4. Re-enter the characters and tick the "Write as unicode" box.
Nope does not work either.   :(

5. Try again, now with both specifying "Unicode" as encoding ticking the "Write as unicode" box.
Nope does not work either.  >:(
Note: depending on whether or not you first re-enter the unicode characters and then change the encoding to unicode, you get a warning dialogue stating that the IPTC info should be re-read (this means you have to re-enter the special characters).  This however has no influence on the result...

Right time to try to change the default encoding and see what happens then.
6. While you can't change the default encoding to "Unicode", you can specify to write IPTC in Unicode by default so that's what I chose.
7. Open image info again and try to enter the unicode characters again.
This does NOT work either.  :'(

Some more findings:
  • some unicode characters do seem to stick; for instance ð works (I guess this has to do with them being present in the IPTC font encoding).
  • during my testing I sometimes got a message that the font encoding does not support all characters and that it proposes to write unicode (IIRC I got this as an option).  I can't reproduce this message, anymore, however and I'm quite sure it didn't help either.
  • I also know at some point I have been able to get some of the special characters to stick in one of the fields (e.g., headline and location if I recall correctly).  I can't reproduce this either though, it seemed erratic as well; on one field would stick, not on another, then I tried again and suddenly it stuck on another field as well.  I looked as if PM checked to see whether or not a field was changed before it changes its value in the file.  Changing an A back into an Ā does not seem to trigger this, but adding characters did.  But as I said, I can't reproduce this anymore...

Am I missing something, or is something really broken?  (The CodedCharacterSet IPTC field sticking to "ESC%G" definitely is a bug)

Hope my story makes sense and allows you to reproduce (and fix) things.
« Last Edit: May 06, 2019, 11:21:37 PM by Hayo Baan »
Hayo Baan - Photography
Web: www.hayobaan.nl

Offline Kirk Baker

  • Senior Software Engineer
  • Camera Bits Staff
  • Superhero Member
  • *****
  • Posts: 24730
    • View Profile
    • Camera Bits, Inc.
Re: Unicode IPTC support not working properly
« Reply #1 on: October 20, 2007, 10:39:58 AM »
Hayo,

There's some strange behaviour with respect to unicode characters in IPTC/XMP fields.

Windows XP, PM version 4.5.3 beta 1015
XMP/IPTC preferences:
  • Order: "Sidecar", "XMP", "IPTC"
  • Both XMP and IPTC are always added
  • Default encoding: "Windows Latin1+Euro", not written as Unicode

First of all the CodedCharacterSet IPTC field always gets the "ESC%G" value indicating Unicode encoding, even though I have not specified this.  This behaviour is new with the 4.5.3 beta I'm now using.  I quite sure this wasn't the case with 4.5.2 (certainly not with 4.5.1), so this is a newly introduced bug.

The second problem is more complex, but basically it comes to this: the IPTC information "forgets" it has special characters...  Let me try to describe this in a reproducible way.
1. In the IPTC dialogue, enter some special Unicode characters like ?????????? in the Caption field.
2. Save (OK on the dialogue)
3. Open the image information again and voila, your characters have magically changed into AaAaGgHhIi. :o
Ah, remember I hadn't specified unicode encoding, so this may have been the cause (but then again I have XMP embedded as well, which takes precedence and always is in unicode).  ???

4. Re-enter the characters and tick the "Write as unicode" box.
Nope does not work either.   :(

5. Try again, now with both specifying "Unicode" as encoding ticking the "Write as unicode" box.
Nope does not work either.  >:(
Note: depending on whether or not you first re-enter the unicode characters and then change the encoding to unicode, you get a warning dialogue stating that the IPTC info should be re-read (this means you have to re-enter the special characters).  This however has no influence on the result...

Right time to try to change the default encoding and see what happens then.
6. While you can't change the default encoding to "Unicode", you can specify to write IPTC in Unicode by default so that's what I chose.
7. Open image info again and try to enter the unicode characters again.
This does NOT work either.  :'(

Some more findings:
  • some unicode characters do seem to stick; for instance ð works (I guess this has to do with them being present in the IPTC font encoding).
  • during my testing I sometimes got a message that the font encoding does not support all characters and that it proposes to write unicode (IIRC I got this as an option).  I can't reproduce this message, anymore, however and I'm quite sure it didn't help either.
  • I also know at some point I have been able to get some of the special characters to stick in one of the fields (e.g., headline and location if I recall correctly).  I can't reproduce this either though, it seemed erratic as well; on one field would stick, not on another, then I tried again and suddenly it stuck on another field as well.  I looked as if PM checked to see whether or not a field was changed before it changes its value in the file.  Changing an A back into an ? does not seem to trigger this, but adding characters did.  But as I said, I can't reproduce this anymore...

Am I missing something, or is something really broken?  (The CodedCharacterSet IPTC field sticking to "ESC%G" definitely is a bug)

Hope my story makes sense and allows you to reproduce (and fix) things.

It would be helpful to know what image type you're working with.  Are you working with RAW+JPEG?

It would also be helpful to know what your exact IPTC/XMP preferences are.  Please post a screenshot.

Thanks,

-Kirk

Offline Hayo Baan

  • Uber Member
  • ******
  • Posts: 2552
  • Professional Photographer & Software Developer
    • View Profile
    • Hayo Baan - Photography
Re: Unicode IPTC support not working properly
« Reply #2 on: October 21, 2007, 02:57:31 AM »
It was a TIF file.  But when trying this on a NEF or JPG file, the same applies.

For completeness I have attached a screenshot of the IPTC/XMP preferences.

[attachment deleted by admin]
Hayo Baan - Photography
Web: www.hayobaan.nl

Offline Hayo Baan

  • Uber Member
  • ******
  • Posts: 2552
  • Professional Photographer & Software Developer
    • View Profile
    • Hayo Baan - Photography
Re: Unicode IPTC support not working properly
« Reply #3 on: November 01, 2007, 01:56:46 AM »
Just to let everyone know.

We found the cause of the disappearing Unicode Characters.  It turned out that the Windows version of the Spell Checker as used by PM is not Unicode safe (this isn't a problem on the Mac).  So if you need to enter special unicode characters, that would not be part of your character encoding (like the H-bar and G-dot in "Ħaġar Qim" are not part of Windows Latin1+Euro encoding), turn off the spell checker if you are using Windows. (boy, I'm glad I'm switching to Mac soon :D)

The sticky "ESC%G" turned-out to be a minor bug and has been fixed in the latest beta.

Thank you Kirk and Dennis!
Hayo Baan - Photography
Web: www.hayobaan.nl