thomasschuerger opened a new issue, #11733: URL: https://github.com/apache/lucene/issues/11733
### Description The GermanNormalizationFilter includes the following mappings: ä/ae -> a, ö/oe -> o, ü/ue -> u and ß -> ss (plus some simple rules when "ue" should not be converted to "u"). This mapping is very uncommon in German. In German, it is common to treat ä and ae, ö and oe, ü and ue, as well as ß and ss as equivalent (the ASCII versions are used in cases where you cannot use the non-ASCII characters, e.g. when using an English keyboard or when the system doesn't allow these characters). With this mapping, searching for "Uber" (the company) finds the frequent word "über", which is unexpected, because "u" and "ü" are (normally) not treated as equivalent. Therefore I would like to see a filter that normalizes German by mapping ä->ae, ö->oe, ü->ue and ß->ss, either by an additional parameter for GermanNormalizationFilter which switches to that mapping (the previous mapping should of course be the default), or by having a separate filter (GermanNormalizationFilter2?) with that mapping. Using a charfilter is not the same, as this is done before the whole filter chain. The new filter should be a drop-in replacement for GermanNormalizationFilter in any position in the filter chain. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org