Jacob Lauritzen created LUCENE-9939:
---------------------------------------

             Summary: Proper ASCII folding of Danish/Norwegian characters Ø, Å
                 Key: LUCENE-9939
                 URL: https://issues.apache.org/jira/browse/LUCENE-9939
             Project: Lucene - Core
          Issue Type: Improvement
          Components: modules/analysis
            Reporter: Jacob Lauritzen


The current version of the ASCIIFoldingFilter sets Å, å to A, a and Ø, ø to O, 
o which I believe is incorrect.

Å was added by Norway as a replacement for the Aa (which is mapped to aa in the 
AsciiFoldingFilter) in 1917 and by Denmark in 1948. Aa is still used in a lot 
of names (as an example the second largest city in Denmark was originally named 
Aarhus, renamed to Århus in 1948 and named back to AArhus in 2010 for 
internationalization purposes).

The story of Ø is similar. It's equivalent to Œ (which is mapped to oe), not ö 
(which is mapped to o) and is generally mapped to oe in ascii text.

The third Danish character Æ is already properly mapped to AE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to