rmuir opened a new pull request, #14389: URL: https://github.com/apache/lucene/pull/14389
Regexp has the ability to erase case differences at query time (the slow way), but there's no corresponding ability to do it the fast-way: at index time. There's LowerCaseFilter, but LowerCaseFilter normalizes text for display purposes, which is different than case folding which eliminates case differences and is appropriate for search. Generate fold() data in a similar way as expand() data. Expose via UnicodeUtil and tableize basic latin for performance. Add CaseFoldingFilter. No Analyzer chains have been modified yet, but we should be able to improve Unicode support by swapping out LowerCaseFilter as a followup. Some filters such as GreekLowerCaseFilter can probably be eliminated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org