rmuir commented on PR #14350: URL: https://github.com/apache/lucene/pull/14350#issuecomment-2722901888
> Would a check with `Character.isLowerCase()` on each input codepoint for the case-insensitive case be sufficient to reject that kind of input across all valid Unicode strings? I dont think so for greek. I would step back from that and try to get matching working with simple Character.toLowerCase/toUpperCase first? If the user provides data with a certain order/casing as you suggest, will they always get a DFA? I'm less concerned about it being minimal, let's start with deterministic. I don't think we should do this if it will explode (e.g. NFA). And union of strings really wants to do that, if handled the naive way, that's why there is a special algorithm for it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org