msfroh commented on PR #14350: URL: https://github.com/apache/lucene/pull/14350#issuecomment-2722105723
My thinking is that a query that uses this should lowercase, dedupe, and sort the input before feeding it into `StringsToAutomaton`. That would handle @dweiss's example (i.e. that input is "invalid", or at least exact duplicates wouldn't add any new states, I think, since the full string exists as a prior prefix). Would a check with `Character.isLowerCase()` on each input codepoint for the case-insensitive case be sufficient to reject that kind of input across all valid Unicode strings? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org