Author Topic: IPTC Character Set?  (Read 4530 times)

Offline mfryd

  • Newcomer
  • *
  • Posts: 28
    • View Profile
IPTC Character Set?
« on: June 07, 2011, 10:40:33 AM »
Is there a way of examining a JPEG file to determine what character set Photo Mechanic used when it wrote the caption field?

Is there anyway to have Photo Mechanic go through a set of images, and re-write the captions in UTF-8?


Our photographers use Photo Mechanic to select and caption images.  We ask them have always use Unicode.  Sometimes they forget.

Our software reads the caption information, and automatically builds web pages.  When the caption isn't in unicode special symbols (registered trademark, copyright, etc) don't come out correctly.

I would like to update our software to handle other character sets, but I don't know how to tell what character set Photo mechanic used when it wrote the caption.

Any help or guidance would be appreciated.

Offline Kirk Baker

  • Senior Software Engineer
  • Camera Bits Staff
  • Superhero Member
  • *****
  • Posts: 25020
    • View Profile
    • Camera Bits, Inc.
Re: IPTC Character Set?
« Reply #1 on: June 07, 2011, 11:20:46 AM »
Is there a way of examining a JPEG file to determine what character set Photo Mechanic used when it wrote the caption field?

Unless it was written with UTF-8, the answer is no.

Quote from: mfryd
Is there anyway to have Photo Mechanic go through a set of images, and re-write the captions in UTF-8?

Yes.  Use the IPTC Stationery Pad to clear some unused field and apply it to all images.

Quote from: mfryd
Our photographers use Photo Mechanic to select and caption images.  We ask them have always use Unicode.  Sometimes they forget.

Our software reads the caption information, and automatically builds web pages.  When the caption isn't in unicode special symbols (registered trademark, copyright, etc) don't come out correctly.

I would like to update our software to handle other character sets, but I don't know how to tell what character set Photo mechanic used when it wrote the caption.

It's just not possible to tell unless it was UTF-8 and then there is no issue for your workflow anyway.  The most common character sets in use are MacRoman and ISO-8859-1 (Western European/Latin-1).

Is it possible to have your web software use XMP instead?  It is always Unicode text.

-Kirk

Offline mfryd

  • Newcomer
  • *
  • Posts: 28
    • View Profile
Re: IPTC Character Set?
« Reply #2 on: June 07, 2011, 11:27:52 AM »
Kirk,

Thanks for the answer.  Unfortunately we sometimes publish hundreds of images over a few hour period.  My photographers wold revolt if I made them upload XMP files along with the JPEG files.

Is there in easy way of telling if the captions are in Unicode?  At least I can detect the issue and take some sort of action.

If I can ask you one more question, if the file doesn't indicate what character set is being used, how does Photo Mechanic know what character set to use when displaying existing captions?


Offline Kirk Baker

  • Senior Software Engineer
  • Camera Bits Staff
  • Superhero Member
  • *****
  • Posts: 25020
    • View Profile
    • Camera Bits, Inc.
Re: IPTC Character Set?
« Reply #3 on: June 07, 2011, 11:36:10 AM »
Thanks for the answer.  Unfortunately we sometimes publish hundreds of images over a few hour period.  My photographers wold revolt if I made them upload XMP files along with the JPEG files.

XMP is embedded in JPEGs.  There are no sidecar XMP files with JPEGs.
It is likely that your JPEGs already have XMP in them unless you specifically tell PM not to write both IPTC and XMP.

Quote from: mfryd
Is there in easy way of telling if the captions are in Unicode?  At least I can detect the issue and take some sort of action.

No, not without examining the file with some sort of utility.  Perhaps ExifTool will show that information.

Quote from: mfryd
If I can ask you one more question, if the file doesn't indicate what character set is being used, how does Photo Mechanic know what character set to use when displaying existing captions?

It uses the default character set you have chosen in the IPTC/XMP pane in the Photo Mechanic Preferences dialog.

-Kirk

Offline mfryd

  • Newcomer
  • *
  • Posts: 28
    • View Profile
Re: IPTC Character Set?
« Reply #4 on: June 07, 2011, 11:45:31 AM »
The photos are going out from my web site to newspapers, magazines, and other publications throughout the world.  I suspect we need to stick with IPTC to maximize compatibility.

I am writing the software to read the JPEG files. I'll see if I can figure out what flag indicates Unicode.

If the file is Unicode, I will be happy.  If the file is something else, I'll assume it's in the Mac default character set, and convert to Unicode (most of my photographers are using the Mac version of Photo Mechanic).

Thanks again for your help.