Re: [PR] [DRAFT] Case-insensitive matching over union of strings [lucene]

via GitHub Thu, 13 Mar 2025 10:29:13 -0700


msfroh commented on PR #14350:
URL: https://github.com/apache/lucene/pull/14350#issuecomment-2722105723


   My thinking is that a query that uses this should lowercase, dedupe, and 
sort the input before feeding it into `StringsToAutomaton`. That would handle 
@dweiss's example (i.e. that input is "invalid", or at least exact duplicates 
wouldn't add any new states, I think, since the full string exists as a prior 
prefix). 
   
   Would a check with `Character.isLowerCase()` on each input codepoint for the 
case-insensitive case be sufficient to reject that kind of input across all 
valid Unicode strings?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] [DRAFT] Case-insensitive matching over union of strings [lucene]

Reply via email to