Author Topic: Unwanted bonus hits when searching on words containing Danish letters  (Read 1527 times)

Offline FrankNT

  • Newcomer
  • *
  • Posts: 3
    • View Profile
In the Organizer Search tab, searching on "tåge" (tåge=fog in English) as expected hits where keywords contains "tåge".
BUT -  it also hits images where "tage" is part of a keyword, in this example "Klitplantage". Apparently the "å" is also interpreted as"a".

2 example photos (jpg/raf/xmp) can be found here: https://www.dropbox.com/sh/rzt5yxubkge3jt2/AADGA0taXPC3lFct-LSblNxda?dl=0

Metadata is written as Unicode for all photos.
On Windows:
Lower case å = ALT/134
Upper case Å = ALT/143

PM Plus Version 6.0 build 6097 (85d1687)
Win 10 version 20H2 (OS Build 19042.1288)

Please advise....
BR FRank





Offline Kirk Baker

  • Senior Software Engineer
  • Camera Bits Staff
  • Superhero Member
  • *****
  • Posts: 24767
    • View Profile
    • Camera Bits, Inc.
Re: Unwanted bonus hits when searching on words containing Danish letters
« Reply #1 on: November 09, 2021, 03:25:24 PM »
Frank,

In the Organizer Search tab, searching on "tåge" (tåge=fog in English) as expected hits where keywords contains "tåge".
BUT -  it also hits images where "tage" is part of a keyword, in this example "Klitplantage". Apparently the "å" is also interpreted as"a".

2 example photos (jpg/raf/xmp) can be found here: https://www.dropbox.com/sh/rzt5yxubkge3jt2/AADGA0taXPC3lFct-LSblNxda?dl=0

Thanks for the sample files.  I'm able to reproduce the problem if I omit the double quotes around the word tåge, but if I include them and instead search for "tåge" then I only see the one file (2020.0916-075447-1483.RAF+JPG).

Try adding double quotes around your search terms.

-Kirk

Offline FrankNT

  • Newcomer
  • *
  • Posts: 3
    • View Profile
Re: Unwanted bonus hits when searching on words containing Danish letters
« Reply #2 on: November 10, 2021, 01:28:36 PM »

Thanks Kirk, that workaround makes some sense, but...

As stated in the Search Examples: "Terms enclosed in double quotes are searched as a single phrase".
From my search examples below it looks like double quotes makes the search hit on whole words only.  A single phrase consist of whole words, I guess.

No matter what, the 'a' and 'å' seems to be interchangeable...?


The image ..1483 has the keyword "Tåge"
The image ..6154 has the keyword "Vester Thorup Klitplantage"

Search term             1483    6154
===========             ====    ====
tåge                    hit     hit
"tåge"                  hit     -
tage                    hit     hit
"tage"                  hit     -
Klitplantage            -       hit
"Klitplantage"          -       hit
Klitplantåge            -       hit     note, 'Klitplantåge' is a non-existant word in Danish
"Klitplantåge"          -       hit
Thorup Klitplantåge     -       hit
"Thorup Klitplantåge"   -       hit
rup Klitplantåge        -       hit
"rup Klitplantåge"      -       -

BR Frank

Offline Kirk Baker

  • Senior Software Engineer
  • Camera Bits Staff
  • Superhero Member
  • *****
  • Posts: 24767
    • View Profile
    • Camera Bits, Inc.
Re: Unwanted bonus hits when searching on words containing Danish letters
« Reply #3 on: November 10, 2021, 01:44:29 PM »
Thanks Kirk, that workaround makes some sense, but...

As stated in the Search Examples: "Terms enclosed in double quotes are searched as a single phrase".
From my search examples below it looks like double quotes makes the search hit on whole words only.

Correct.

No matter what, the 'a' and 'å' seems to be interchangeable...?

Due to unicode normalization, yes.  We found early on that there are often several ways to represent non-ASCII characters in Unicode such that exact matching is not possible.  This would be frustrating to users so we use normalization to make things less frustrating for most users.

-Kirk