[PR] Add CaseFolding.fold(), inverse of expand(), move to UnicodeUtil, add filter [lucene]

via GitHub Sat, 05 Apr 2025 10:24:20 -0700


rmuir opened a new pull request, #14389:
URL: https://github.com/apache/lucene/pull/14389


   Regexp has the ability to erase case differences at query time (the slow 
way), but there's no corresponding ability to do it the fast-way: at index time.
   
   There's LowerCaseFilter, but LowerCaseFilter normalizes text for display 
purposes, which is different than case folding which eliminates case 
differences and is appropriate for search.
   
   Generate fold() data in a similar way as expand() data. Expose via 
UnicodeUtil and tableize basic latin for performance. Add CaseFoldingFilter.
   
   No Analyzer chains have been modified yet, but we should be able to improve 
Unicode support by swapping out LowerCaseFilter as a followup. Some filters 
such as GreekLowerCaseFilter can probably be eliminated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[PR] Add CaseFolding.fold(), inverse of expand(), move to UnicodeUtil, add filter [lucene]

Reply via email to