Re: [PR] RegExp: add CASE_INSENSITIVE_RANGE support [lucene]

2025-04-05 Thread via GitHub
rmuir commented on code in PR #14381: URL: https://github.com/apache/lucene/pull/14381#discussion_r2007499003 ## lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java: ## @@ -778,6 +786,53 @@ private int[] toCaseInsensitiveChar(int codepoint) { } } + /** +

Re: [PR] RegExp: add CASE_INSENSITIVE_RANGE support [lucene]

2025-04-05 Thread via GitHub
rmuir commented on PR #14381: URL: https://github.com/apache/lucene/pull/14381#issuecomment-2743822277 @dweiss thanks for the suggestion there, gazillions of array creations avoided. so now this thing will only spike cpu during parsing at worst. I honestly forget you can pass functions to f

Re: [PR] RegExp: add CASE_INSENSITIVE_RANGE support [lucene]

2025-03-21 Thread via GitHub
rmuir merged PR #14381: URL: https://github.com/apache/lucene/pull/14381 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

Re: [PR] RegExp: add CASE_INSENSITIVE_RANGE support [lucene]

2025-03-21 Thread via GitHub
rmuir commented on PR #14381: URL: https://github.com/apache/lucene/pull/14381#issuecomment-2743798573 after fixing the turkish here's the (correct) automaton for `/[a-z]/`: the only special cases are long-s and kelvin sign as you expect: ![graphviz (6)](https://github.com/user-attac

Re: [PR] RegExp: add CASE_INSENSITIVE_RANGE support [lucene]

2025-03-21 Thread via GitHub
rmuir commented on code in PR #14381: URL: https://github.com/apache/lucene/pull/14381#discussion_r2007006500 ## lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java: ## @@ -778,6 +786,53 @@ private int[] toCaseInsensitiveChar(int codepoint) { } } + /** +

[PR] RegExp: add CASE_INSENSITIVE_RANGE support [lucene]

2025-03-21 Thread via GitHub
rmuir opened a new pull request, #14381: URL: https://github.com/apache/lucene/pull/14381 Add optional flag to support case-insensitive ranges. A minimal DFA is always created. This works with Unicode but may have a performance cost. Each codepoint in the range must be iterated, and a

Re: [PR] RegExp: add CASE_INSENSITIVE_RANGE support [lucene]

2025-03-20 Thread via GitHub
dweiss commented on code in PR #14381: URL: https://github.com/apache/lucene/pull/14381#discussion_r2006930601 ## lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java: ## @@ -778,6 +786,53 @@ private int[] toCaseInsensitiveChar(int codepoint) { } } + /**