Re: [PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

via GitHub Tue, 11 Feb 2025 09:10:18 -0800


john-wagster commented on PR #14192:
URL: https://github.com/apache/lucene/pull/14192#issuecomment-2651490994


   @rmuir I made another pass based on your feedback and I'm good with and 
agree to keep this simple for a first pass.  To that end I've done the 
following: 
   
   * CaseFolding is no longer externally exposed (can revisit this later)
   * CaseFolding has one method to get the set of alternates renamed from 
`fold` which was inaccurate previously.  The idea being to introduce a `fold` 
method in the future if it's valuable.  
   * CaseFolding only includes the set of edge cases that are not handled by 
`Character.toLowerCase` and `Character.toUpperCase` and is a switch/case 
statement now with hex codepoints and comments on ever character
       * this is similar to ASCIIFoldingFilter
       * it was built using 
https://www.unicode.org/Public/16.0.0/ucd/CaseFolding.txt and filtered by the 
set of things not already covered by `Character.toLowerCase` and 
`Character.toUpperCase`
   * The utility class CaseFoldingUtil for generating the switch / case 
statement code in the CaseFolding class has been removed.
   * ASCII_CASE_INSENSITIVE flag has been deprecated and has the same behavior 
as the CASE_INSENSITIVE flag
   * Tried to minimize changes otherwise with less concern for performance 
during compilation of the regex
   
   Thoughts at this point and whether I've adequately incorporated your 
feedback would be much appreciated. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

Reply via email to