rmuir commented on issue #14659:
URL: https://github.com/apache/lucene/issues/14659#issuecomment-2900761266

   > It's like conflating "rn" and "m" to merge burn/bum and corn/com. It could 
happen when reading quickly or with poor handwriting, but it is not something 
that should happen for search indexing.
   
   If you read the referenced documents, these mappings are specifically for 
this exact purpose. It solves technical issues of graphical vs logical order 
with fonts. It sounds like you don't want this: if you have perfect unicode 
text from wikipedia that doesn't suffer from such damage, don't use this filter 
as you will find more mappings you don't like.
   
   The problems dealt with by the filter happen most often with text written in 
legacy fonts, extracted from PDF, etc, etc. In such cases, the foldings are 
essential: the improvements can be seen (and measured) in FIRE IR benchmarks.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to