willdickerson opened a new pull request, #14349:
URL: https://github.com/apache/lucene/pull/14349

   ## Overview
     This PR introduces a proof of concept for a case-insensitive variant of 
TermInSetQuery. The implementation provides an efficient way to search for 
terms regardless of case without needing to generate all case variations.
   
     ## Implementation Details
     - Extends `MultiTermQuery` to leverage existing infrastructure
     - Uses regular expressions with case-insensitive flag for pattern matching
     - Implements filtering through a `ByteRunAutomaton` for efficient matching
     - Ensures proper handling of deterministic automata
     - Follows Lucene's visitor pattern for queries
   
     ## Limitations
     - Uses Java's standard case folding which may not handle all Unicode 
special cases
     - For full locale-aware case folding, an analyzer should be used during 
indexing
   
     ## Testing
     The implementation includes comprehensive test cases covering:
     - Basic case-insensitive matching
     - Multiple term behavior
     - Unicode character testing with notes on special cases
     - Comparison with standard TermInSetQuery
     - Visitor pattern
     - Contract tests (equals/hashCode)
     - Randomized testing with Unicode characters
   
     This POC is intended to gather feedback from the community


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to