rmuir commented on issue #14659: URL: https://github.com/apache/lucene/issues/14659#issuecomment-2900761266
> It's like conflating "rn" and "m" to merge burn/bum and corn/com. It could happen when reading quickly or with poor handwriting, but it is not something that should happen for search indexing. If you read the referenced documents, these mappings are specifically for this exact purpose. It solves technical issues of graphical vs logical order with fonts. It sounds like you don't want this: if you have perfect unicode text from wikipedia that doesn't suffer from such damage, don't use this filter as you will find more mappings you don't like. The problems dealt with by the filter happen most often with text written in legacy fonts, extracted from PDF, etc, etc. In such cases, the foldings are essential: the improvements can be seen (and measured) in FIRE IR benchmarks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org