Author Topic: Polish characters for code replacement?  (Read 23017 times)

Offline dmwierz

  • Member
  • **
  • Posts: 50
    • View Profile
Polish characters for code replacement?
« on: June 22, 2007, 01:07:04 PM »
OK, so I'm covering an MLS game tonight between the Chicago Fire and Team Cracovia (Krakow) from Poland. I've screen-scraped the Cracovia roster from their Web site, and generated a Word and Excel document, and formated the text as Polish language, but whenever I save it as Text Tab, then try to use it, the Polish characters turn into "_". Any way to get around this?

When I re-open the the TXT file, the "_" are the, too, so any suggestions? If I save as an RTF the text retains the Polish characters, but PM doesn't seem to want to import the RTF file when I go to "Add" this as a Code Replacement file.

???


Offline Kirk Baker

  • Senior Software Engineer
  • Camera Bits Staff
  • Superhero Member
  • *****
  • Posts: 25503
    • View Profile
    • Camera Bits, Inc.
Re: Polish characters for code replacement?
« Reply #1 on: June 22, 2007, 03:36:13 PM »
OK, so I'm covering an MLS game tonight between the Chicago Fire and Team Cracovia (Krakow) from Poland. I've screen-scraped the Cracovia roster from their Web site, and generated a Word and Excel document, and formated the text as Polish language, but whenever I save it as Text Tab, then try to use it, the Polish characters turn into "_". Any way to get around this?

When I re-open the the TXT file, the "_" are the, too, so any suggestions? If I save as an RTF the text retains the Polish characters, but PM doesn't seem to want to import the RTF file when I go to "Add" this as a Code Replacement file.

You need to encode the text in the file as UTF-8 Unicode which will then handle all possible characters.

There are many text editors that can save in the UTF-8 format.

-Kirk

Offline dmwierz

  • Member
  • **
  • Posts: 50
    • View Profile
Re: Polish characters for code replacement?
« Reply #2 on: June 22, 2007, 10:10:31 PM »
Kirk,

Word and Excel offer UTF-16 encoding but not UTF-8, so I tried this and the file retained the characters within the file. However, when I imported the code replacement file, it failed to work.

???


« Last Edit: June 22, 2007, 10:16:32 PM by dmwierz »

Offline Kirk Baker

  • Senior Software Engineer
  • Camera Bits Staff
  • Superhero Member
  • *****
  • Posts: 25503
    • View Profile
    • Camera Bits, Inc.
Re: Polish characters for code replacement?
« Reply #3 on: June 23, 2007, 08:07:12 AM »
Word and Excel offer UTF-16 encoding but not UTF-8, so I tried this and the file retained the characters within the file. However, when I imported the code replacement file, it failed to work.

You'll need to use a simple text editor that supports UTF-8 encoding.  On Windows I recommend Notepad2:

http://www.flos-freeware.ch/notepad2.html

On Mac OS X, I recommend TextWrangler2:

http://www.barebones.com/products/textwrangler/

Both of these editors are easy to use and do a fine job of handling UTF-8 encoding.

-Kirk

Offline dmwierz

  • Member
  • **
  • Posts: 50
    • View Profile
Re: Polish characters for code replacement?
« Reply #4 on: June 23, 2007, 08:29:48 AM »
Kirk,

Thanks for the tip. Downloaded Text Wrangler, then imported the UTF-8 file into PM, and still, it doesn't work. When I try to use the "\xyz\" convention, nothing happens. Perhaps I can send you the file I made? Code replacement works with all the other files I've made in Word or Excel, but not this one.

The point will soon be moot as I'm captioning the images right now, and may end up just cutting and pasting the names from the UTF-8 file. Will sure be a pain, though.

Ideas?

Offline Kirk Baker

  • Senior Software Engineer
  • Camera Bits Staff
  • Superhero Member
  • *****
  • Posts: 25503
    • View Profile
    • Camera Bits, Inc.
Re: Polish characters for code replacement?
« Reply #5 on: June 23, 2007, 08:42:04 AM »
Thanks for the tip. Downloaded Text Wrangler, then imported the UTF-8 file into PM, and still, it doesn't work. When I try to use the "\xyz\" convention, nothing happens. Perhaps I can send you the file I made? Code replacement works with all the other files I've made in Word or Excel, but not this one.

The point will soon be moot as I'm captioning the images right now, and may end up just cutting and pasting the names from the UTF-8 file. Will sure be a pain, though.

Send me the file.  Contact me directly by clicking on my name to the left of this message.  Send me a personal message.  I will give you an email address where you can send your file as an attachment.

-Kirk

Offline dmwierz

  • Member
  • **
  • Posts: 50
    • View Profile
Re: Polish characters for code replacement?
« Reply #6 on: June 23, 2007, 08:48:13 AM »
Kirk,

Got it sorted out - no idea why it wasn't working, but it is now, however now whenever I save a caption, I'm presented with a dialogue asking me if I want to save it in Unicode UTF-8. Any way to set this to "Always Yes", or some such?

Thanks for the great support.

Offline Kirk Baker

  • Senior Software Engineer
  • Camera Bits Staff
  • Superhero Member
  • *****
  • Posts: 25503
    • View Profile
    • Camera Bits, Inc.
Re: Polish characters for code replacement?
« Reply #7 on: June 23, 2007, 09:01:14 AM »
Got it sorted out - no idea why it wasn't working, but it is now, however now whenever I save a caption, I'm presented with a dialogue asking me if I want to save it in Unicode UTF-8. Any way to set this to "Always Yes", or some such?

Yes, in the IPTC/XMP tab of the Preferences dialog.  Please note that if you're doing IPTC only (not doing XMP) then some applications, including Photoshop cannot properly handle UTF-8 encoded IPTC records.  Photoshop does handle UTF-8 XMP metadata however, so the safest thing is to apply both IPTC and IPTC4XMP when captioning.

-Kirk

Offline dmwierz

  • Member
  • **
  • Posts: 50
    • View Profile
Re: Polish characters for code replacement?
« Reply #8 on: June 23, 2007, 09:04:54 AM »
>>Photoshop cannot properly handle UTF-8 encoded IPTC records.  Photoshop does handle UTF-8 XMP metadata however, so the safest thing is to apply both IPTC and IPTC4XMP when captioning.<<

English? I have no idea what you are talking about...sorry. Does this mean if I edit the files in PhotoShop that I'll lose the data? And what does "apply both IPTC and IPTC4XMP when captioning" mean?

I see a checkbox called "Write IPTC as Unicode". Is this what you mean? I also see a pulldown that defaults to "Add both embedded IPTC and IPTC4XMP" Is this the other thing you mentioned?
« Last Edit: June 23, 2007, 09:08:22 AM by dmwierz »

Offline dmwierz

  • Member
  • **
  • Posts: 50
    • View Profile
Re: Polish characters for code replacement?
« Reply #9 on: June 23, 2007, 09:20:31 AM »
OK, checking the checkbox didn't work...I still get the attached dialogue box when saving the caption. Not a big deal, but kind of a pain.

I would like to understand the admonition from your earlier post, though.

Dennis

[attachment deleted by admin]
« Last Edit: June 23, 2007, 09:40:36 AM by dmwierz »

Offline Kirk Baker

  • Senior Software Engineer
  • Camera Bits Staff
  • Superhero Member
  • *****
  • Posts: 25503
    • View Profile
    • Camera Bits, Inc.
Re: Polish characters for code replacement?
« Reply #10 on: June 23, 2007, 09:49:32 AM »
>>Photoshop cannot properly handle UTF-8 encoded IPTC records.  Photoshop does handle UTF-8 XMP metadata however, so the safest thing is to apply both IPTC and IPTC4XMP when captioning.<<

English? I have no idea what you are talking about...sorry. Does this mean if I edit the files in PhotoShop that I'll lose the data? And what does "apply both IPTC and IPTC4XMP when captioning" mean?

Photoshop will indeed change the IPTC data to non-unicode and will replace the characters with characters that will not match the originals, if you re-save an image with an Unicode-encoded IPTC record.

Quote from: dmwierz
I see a checkbox called "Write IPTC as Unicode". Is this what you mean?

Yes.

Quote from: dmwierz
I also see a pulldown that defaults to "Add both embedded IPTC and IPTC4XMP" Is this the other thing you mentioned?

Yes.

-Kirk

Offline Kirk Baker

  • Senior Software Engineer
  • Camera Bits Staff
  • Superhero Member
  • *****
  • Posts: 25503
    • View Profile
    • Camera Bits, Inc.
Re: Polish characters for code replacement?
« Reply #11 on: June 23, 2007, 09:57:29 AM »
Dennis,

OK, checking the checkbox didn't work...I still get the attached dialogue box when saving the caption. Not a big deal, but kind of a pain.

I get the same undesired behavior.  I will look into fixing it.  You can also check the "Write as Unicode" checkbox in the IPTC Info dialog, but it will be unchecked the next time you reopen the IPTC Info dialog.

Quote from: dmwierz
I would like to understand the admonition from your earlier post, though.

Admonition?  Here is what happens when you interpret UTF-8 data as a default character set: characters that are just standard 7-bit ASCII look just fine.  But accented characters or characters that cannot be represented in an 8-bit character set are rendered as a series of incorrect characters.  When this incorrectly interpreted data is then written out, the text will be changed from its original state.

We have contacted Adobe more than once over this issue and have gotten no response.  They obviously can handle UTF-8 encoding since they deal with it in XMP so we figured it would be an easy fix but I guess our concern was not recognized by them.  CS3 still has this problem with UTF-8 encoded IPTC records.

If all of your apps can handle XMP only, then you can go XMP only with PM and the encoding issue just disappears completely.

-Kirk

Offline dmwierz

  • Member
  • **
  • Posts: 50
    • View Profile
Re: Polish characters for code replacement?
« Reply #12 on: June 23, 2007, 10:03:23 AM »
>>If all of your apps can handle XMP only, then you can go XMP only with PM and the encoding issue just disappears completely.<<

Thanks for hanging in there with me, but this is gibberish. How am I to know if "all my apps can handle XMP only", whatever than means? I'm a photographer, not an IT Guy, so be gentle  ;)

Simply: If I caption in PM after all PS editing is finished, then there will be no problem?

Offline Kirk Baker

  • Senior Software Engineer
  • Camera Bits Staff
  • Superhero Member
  • *****
  • Posts: 25503
    • View Profile
    • Camera Bits, Inc.
Re: Polish characters for code replacement?
« Reply #13 on: June 23, 2007, 10:42:31 AM »
>>If all of your apps can handle XMP only, then you can go XMP only with PM and the encoding issue just disappears completely.<<

Thanks for hanging in there with me, but this is gibberish. How am I to know if "all my apps can handle XMP only", whatever than means? I'm a photographer, not an IT Guy, so be gentle  ;)

Simply: If I caption in PM after all PS editing is finished, then there will be no problem?

Yes, and no.  Since you don't have any control over what apps other people use when viewing the photos you submit to them, the answer can be "no."

-Kirk

Offline dmwierz

  • Member
  • **
  • Posts: 50
    • View Profile
Re: Polish characters for code replacement?
« Reply #14 on: June 23, 2007, 12:25:48 PM »
OK, well it seems the only way to keep control of the text in the captions is to Anglicize the names, and eliminate the non-standard characters, so this is what I'll do.

Makes me wonder what the rest of the World (non-English speakers) do when they caption or use PM?