Re: [I] Case insensitive regex query with character range [lucene]

via GitHub Fri, 04 Apr 2025 12:03:44 -0700


rmuir commented on issue #14378:
URL: https://github.com/apache/lucene/issues/14378#issuecomment-2741911493


   we are better armed to begin looking at this one after recent changes, see 
https://github.com/apache/lucene/pull/14193#issuecomment-2638840849
   
   now that there is an actual single parsing node that correlates with single 
automaton state for a character class, we can support the caseless ranges using 
"icu algorithm" (iterating over every char in the range and adding its 
variants) without explosion of objects: but unicode space is big, so it will 
use up some CPU time during parsing. This is why I think it needs opt-in flag.
   
   Also we have the potential problem that Automaton.finishState() will 
collapse huge `int[]` into efficient ranges, but it doesn't seem to ever free 
the memory. e.g. if i add 50,000 transitions to it and then it reduces them to 
one simple range, I think the underlying array is still 50,000. I will attempt 
to test and address this first if possible, let's avoid memory issues.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] Case insensitive regex query with character range [lucene]

Reply via email to