rmuir commented on issue #14378: URL: https://github.com/apache/lucene/issues/14378#issuecomment-2741911493
we are better armed to begin looking at this one after recent changes, see https://github.com/apache/lucene/pull/14193#issuecomment-2638840849 now that there is an actual single parsing node that correlates with single automaton state for a character class, we can support the caseless ranges using "icu algorithm" (iterating over every char in the range and adding its variants) without explosion of objects: but unicode space is big, so it will use up some CPU time during parsing. This is why I think it needs opt-in flag. Also we have the potential problem that Automaton.finishState() will collapse huge `int[]` into efficient ranges, but it doesn't seem to ever free the memory. e.g. if i add 50,000 transitions to it and then it reduces them to one simple range, I think the underlying array is still 50,000. I will attempt to test and address this first if possible, let's avoid memory issues. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org