Re: [PR] [DRAFT] Case-insensitive matching over union of strings [lucene]

via GitHub Thu, 13 Mar 2025 16:19:37 -0700


rmuir commented on PR #14350:
URL: https://github.com/apache/lucene/pull/14350#issuecomment-2722901888


   > Would a check with `Character.isLowerCase()` on each input codepoint for 
the case-insensitive case be sufficient to reject that kind of input across all 
valid Unicode strings?
   
   I dont think so for greek. I would step back from that and try to get 
matching working with simple Character.toLowerCase/toUpperCase first? If the 
user provides data with a certain order/casing as you suggest, will they always 
get a DFA?
   
   I'm less concerned about it being minimal, let's start with deterministic. I 
don't think we should do this if it will explode (e.g. NFA). And union of 
strings really wants to do that, if handled the naive way, that's why there is 
a special algorithm for it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] [DRAFT] Case-insensitive matching over union of strings [lucene]

Reply via email to