[ https://issues.apache.org/jira/browse/LUCENE-9939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jacob Lauritzen updated LUCENE-9939: ------------------------------------ Status: Patch Available (was: Open) > Proper ASCII folding of Danish/Norwegian characters Ø, Å > -------------------------------------------------------- > > Key: LUCENE-9939 > URL: https://issues.apache.org/jira/browse/LUCENE-9939 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis > Reporter: Jacob Lauritzen > Priority: Minor > Labels: easyfix > > The current version of the ASCIIFoldingFilter sets Å, å to A, a and Ø, ø to > O, o which I believe is incorrect. > Å was added by Norway as a replacement for the Aa (which is mapped to aa in > the AsciiFoldingFilter) in 1917 and by Denmark in 1948. Aa is still used in a > lot of names (as an example the second largest city in Denmark was originally > named Aarhus, renamed to Århus in 1948 and named back to AArhus in 2010 for > internationalization purposes). > The story of Ø is similar. It's equivalent to Œ (which is mapped to oe), not > ö (which is mapped to o) and is generally mapped to oe in ascii text. > The third Danish character Æ is already properly mapped to AE. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org