Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-06 Thread via GitHub
dweiss commented on PR #14193: URL: https://github.com/apache/lucene/pull/14193#issuecomment-2640330745 > Finally! The concatenate() issue was an easy fix, it neglected to clean up its dead states. All of its partners in crime do this, but the fact we neglect it for concatenate messes up to

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-06 Thread via GitHub
jpountz commented on code in PR #14193: URL: https://github.com/apache/lucene/pull/14193#discussion_r1944819744 ## lucene/core/src/test/org/apache/lucene/util/automaton/TestAutomaton.java: ## @@ -667,11 +667,14 @@ public void testConcatenatePreservesDet() throws Exception {

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-06 Thread via GitHub
rmuir merged PR #14193: URL: https://github.com/apache/lucene/pull/14193 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-06 Thread via GitHub
rmuir commented on code in PR #14193: URL: https://github.com/apache/lucene/pull/14193#discussion_r1944626865 ## lucene/core/src/test/org/apache/lucene/util/automaton/TestAutomaton.java: ## @@ -667,11 +667,14 @@ public void testConcatenatePreservesDet() throws Exception { }

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-05 Thread via GitHub
rmuir commented on PR #14193: URL: https://github.com/apache/lucene/pull/14193#issuecomment-2638840849 I'm feeling good about this one now, with the change, a lot of regexps now come out minimal from the start, which is a good thing. We also eliminate overhead of tons of nodes, which

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-05 Thread via GitHub
rmuir commented on PR #14193: URL: https://github.com/apache/lucene/pull/14193#issuecomment-2638827849 Finally! The concatenate() issue was an easy fix, it neglected to clean up its dead states. All of its partners in crime do this, but the fact we neglect it for concatenate messes up too m

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-05 Thread via GitHub
rmuir commented on code in PR #14193: URL: https://github.com/apache/lucene/pull/14193#discussion_r1944084497 ## lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java: ## @@ -648,13 +645,16 @@ private Automaton toAutomaton( break; case REGEXP_CHAR:

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-05 Thread via GitHub
rmuir commented on PR #14193: URL: https://github.com/apache/lucene/pull/14193#issuecomment-2638323042 OK I tried out a List-based API as alternative to array-based API. It isn't fully correct, which is part of my issue, but see it here: https://github.com/apache/lucene/commit/8b535a1c2fb4

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-05 Thread via GitHub
rmuir commented on code in PR #14193: URL: https://github.com/apache/lucene/pull/14193#discussion_r1943654993 ## lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java: ## @@ -1195,60 +1215,132 @@ final RegExp parseCharClassExp() throws IllegalArgumentException {

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-05 Thread via GitHub
rmuir commented on code in PR #14193: URL: https://github.com/apache/lucene/pull/14193#discussion_r1943653718 ## lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java: ## @@ -1050,14 +1059,25 @@ static RegExp makeDeprecatedComplement(int flags, RegExp exp) { }

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-05 Thread via GitHub
rmuir commented on code in PR #14193: URL: https://github.com/apache/lucene/pull/14193#discussion_r1943652340 ## lucene/core/src/java/org/apache/lucene/util/automaton/Automata.java: ## @@ -140,6 +141,32 @@ public static Automaton makeCharRange(int min, int max) { return a;

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-05 Thread via GitHub
rmuir commented on code in PR #14193: URL: https://github.com/apache/lucene/pull/14193#discussion_r1943651732 ## lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java: ## @@ -648,13 +645,16 @@ private Automaton toAutomaton( break; case REGEXP_CHAR:

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-05 Thread via GitHub
mikemccand commented on code in PR #14193: URL: https://github.com/apache/lucene/pull/14193#discussion_r1943595546 ## lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java: ## @@ -648,13 +645,16 @@ private Automaton toAutomaton( break; case REGEXP_CHA

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-05 Thread via GitHub
rmuir commented on PR #14193: URL: https://github.com/apache/lucene/pull/14193#issuecomment-2637630803 I opened https://github.com/apache/lucene/issues/14200 for the error-prone situation -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-05 Thread via GitHub
rmuir commented on PR #14193: URL: https://github.com/apache/lucene/pull/14193#issuecomment-2637016936 I tried to update the error prone to fix its bugs, it is angry about the way we do gradle. I will YOLO my way thru this stuff. > The default --should-stop=ifError policy (INIT) is n

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-05 Thread via GitHub
rmuir commented on PR #14193: URL: https://github.com/apache/lucene/pull/14193#issuecomment-2637219251 @dweiss the `List` is a good idea. I did the arrays only because @john-wagster has arrays over on #14192, but in the parser lists are used. The only useful thing about arrays are convenien

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-05 Thread via GitHub
rmuir commented on PR #14193: URL: https://github.com/apache/lucene/pull/14193#issuecomment-2637079875 I would propose replacing this checker with ast-grep rules for whatever we need: it is not a good one. the use of internal java APIs is too crazy. -- This is an automated message from th

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-05 Thread via GitHub
rmuir commented on PR #14193: URL: https://github.com/apache/lucene/pull/14193#issuecomment-2637077428 I fixed the error-prone -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-05 Thread via GitHub
rmuir commented on PR #14193: URL: https://github.com/apache/lucene/pull/14193#issuecomment-2637021676 Like i literally have no idea what this tool is trying to tell me there. But I think errorprone is broken, it depends on too many internals of the java compiler apis. -- This is an auto

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-05 Thread via GitHub
rmuir commented on PR #14193: URL: https://github.com/apache/lucene/pull/14193#issuecomment-2636033017 a few more notes: * maybe we should deprecate `union(Automaton, Automaton)` and only leave `union(List)`. I see the former approach has proven trappy, let's guide developers to do it th

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-05 Thread via GitHub
rmuir commented on PR #14193: URL: https://github.com/apache/lucene/pull/14193#issuecomment-2636006731 I will figure out what angers the error-prone tomorrow. I am rusty with java, so this PR needs assistance LOL. but all the tests pass. -- This is an automated message from the Apache Git

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-04 Thread via GitHub
rmuir commented on PR #14193: URL: https://github.com/apache/lucene/pull/14193#issuecomment-2635997545 My example for this one, if you have something like `[^a-gklM-O\s]`, with the case-insensitive flag maybe, it just calls the new `makeCharClass(int[],int[])` method and you get minimal aut

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-04 Thread via GitHub
rmuir commented on PR #14193: URL: https://github.com/apache/lucene/pull/14193#issuecomment-2635939281 anyway, I think this is the right path, rather than fight with union(), let's just get it out of our way. with this change union() is only used for union operator (`|`) and not internally.

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-04 Thread via GitHub
rmuir commented on PR #14193: URL: https://github.com/apache/lucene/pull/14193#issuecomment-2635936648 That's error-prone that's broke trying to do some null analysis :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-04 Thread via GitHub
rmuir commented on PR #14193: URL: https://github.com/apache/lucene/pull/14193#issuecomment-2635933607 I generalized this to `makeCharClass(int[],int[])`, added a "character class" node to use it instead of unioning many nodes, replaced the pre-built class functionality with it too.