[PR] [WIP] Introduce bpv24 vectorized decoding for DocIdsWriter [lucene]

2025-02-05 Thread via GitHub
gf2121 opened a new pull request, #14203: URL: https://github.com/apache/lucene/pull/14203 ### Description -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[I] Deprecate Operations.concatenate(a1, a2) and Operations.union(a1, a2) [lucene]

2025-02-05 Thread via GitHub
rmuir opened a new issue, #14202: URL: https://github.com/apache/lucene/issues/14202 ### Description These automata operations have two forms today, using concatenate() as an example: ```java /** * Returns an automaton that accepts the concatenation of the languages

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-05 Thread via GitHub
rmuir commented on PR #14193: URL: https://github.com/apache/lucene/pull/14193#issuecomment-2638840849 I'm feeling good about this one now, with the change, a lot of regexps now come out minimal from the start, which is a good thing. We also eliminate overhead of tons of nodes, which

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-05 Thread via GitHub
rmuir commented on PR #14193: URL: https://github.com/apache/lucene/pull/14193#issuecomment-2638827849 Finally! The concatenate() issue was an easy fix, it neglected to clean up its dead states. All of its partners in crime do this, but the fact we neglect it for concatenate messes up too m

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-05 Thread via GitHub
rmuir commented on code in PR #14193: URL: https://github.com/apache/lucene/pull/14193#discussion_r1944084497 ## lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java: ## @@ -648,13 +645,16 @@ private Automaton toAutomaton( break; case REGEXP_CHAR:

Re: [PR] Use multi-select instead of a full sort for DynamicRange creation [lucene]

2025-02-05 Thread via GitHub
houserjohn commented on code in PR #13914: URL: https://github.com/apache/lucene/pull/13914#discussion_r1944050856 ## lucene/facet/src/java/org/apache/lucene/facet/range/DynamicRangeUtil.java: ## @@ -202,66 +208,83 @@ public SegmentOutput(int hitsLength) { * is used to c

Re: [PR] Use multi-select instead of a full sort for DynamicRange creation [lucene]

2025-02-05 Thread via GitHub
houserjohn commented on PR #13914: URL: https://github.com/apache/lucene/pull/13914#issuecomment-2638750375 This is a great improvement for Dynamic Ranges @HoustonPutman! After looking into some more test cases, I believe there may be a bug for *some* unsorted value lists. Consider this uni

Re: [PR] Remove mmap isLoaded check before madvise [lucene]

2025-02-05 Thread via GitHub
github-actions[bot] commented on PR #14156: URL: https://github.com/apache/lucene/pull/14156#issuecomment-2638330081 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-05 Thread via GitHub
rmuir commented on PR #14193: URL: https://github.com/apache/lucene/pull/14193#issuecomment-2638323042 OK I tried out a List-based API as alternative to array-based API. It isn't fully correct, which is part of my issue, but see it here: https://github.com/apache/lucene/commit/8b535a1c2fb4

Re: [PR] SortedSet DV Multi Range query [lucene]

2025-02-05 Thread via GitHub
gsmiller commented on PR #13974: URL: https://github.com/apache/lucene/pull/13974#issuecomment-2638201802 @mkhludnev took one more look and I think this is in good shape to merge. Would you mind adding a CHANGES entry? -- This is an automated message from the Apache Git Service. To respon

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-05 Thread via GitHub
rmuir commented on code in PR #14193: URL: https://github.com/apache/lucene/pull/14193#discussion_r1943654993 ## lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java: ## @@ -1195,60 +1215,132 @@ final RegExp parseCharClassExp() throws IllegalArgumentException {

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-05 Thread via GitHub
rmuir commented on code in PR #14193: URL: https://github.com/apache/lucene/pull/14193#discussion_r1943653718 ## lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java: ## @@ -1050,14 +1059,25 @@ static RegExp makeDeprecatedComplement(int flags, RegExp exp) { }

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-05 Thread via GitHub
rmuir commented on code in PR #14193: URL: https://github.com/apache/lucene/pull/14193#discussion_r1943652340 ## lucene/core/src/java/org/apache/lucene/util/automaton/Automata.java: ## @@ -140,6 +141,32 @@ public static Automaton makeCharRange(int min, int max) { return a;

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-05 Thread via GitHub
rmuir commented on code in PR #14193: URL: https://github.com/apache/lucene/pull/14193#discussion_r1943651732 ## lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java: ## @@ -648,13 +645,16 @@ private Automaton toAutomaton( break; case REGEXP_CHAR:

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-05 Thread via GitHub
mikemccand commented on code in PR #14193: URL: https://github.com/apache/lucene/pull/14193#discussion_r1943595546 ## lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java: ## @@ -648,13 +645,16 @@ private Automaton toAutomaton( break; case REGEXP_CHA

Re: [PR] Error prone back from the dead [lucene]

2025-02-05 Thread via GitHub
rmuir merged PR #14201: URL: https://github.com/apache/lucene/pull/14201 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

Re: [I] improve the error prone situation [lucene]

2025-02-05 Thread via GitHub
rmuir closed issue #14200: improve the error prone situation URL: https://github.com/apache/lucene/issues/14200 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] Error prone back from the dead [lucene]

2025-02-05 Thread via GitHub
rmuir commented on PR #14201: URL: https://github.com/apache/lucene/pull/14201#issuecomment-2637932527 @risdenk thanks for looking. Will get this thing turned on again with hopefully less trouble than the previous version. -- This is an automated message from the Apache Git Service. To re

Re: [PR] Error prone back from the dead [lucene]

2025-02-05 Thread via GitHub
risdenk commented on PR #14201: URL: https://github.com/apache/lucene/pull/14201#issuecomment-2637881979 This looks fine. I do think there are new rules that could be disabled like [DefaultLocale](https://errorprone.info/bugpattern/DefaultLocale) We were keeping a list

Re: [PR] Error prone back from the dead [lucene]

2025-02-05 Thread via GitHub
rmuir commented on PR #14201: URL: https://github.com/apache/lucene/pull/14201#issuecomment-2637895695 @risdenk yes that would absolutely be awesome and help us keep it all sorted -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[PR] Error prone back from the dead [lucene]

2025-02-05 Thread via GitHub
rmuir opened a new pull request, #14201: URL: https://github.com/apache/lucene/pull/14201 Error prone is failing with NoSuchMethodError on JDK21 at least, due to their use of internal compiler apis. Upgrading to 2.33.0 fixes the issue. Upgrading to 2.34+ is not possible as you will t

Re: [PR] supports force merge based on specified segments. [lucene]

2025-02-05 Thread via GitHub
mikemccand commented on PR #14163: URL: https://github.com/apache/lucene/pull/14163#issuecomment-2637866412 > > If you are able to turn on `InfoStream` for the ES shard that won't merge segments with so many deletions, and post a chunk here, I can have a look and see if there are clues.

Re: [I] improve the error prone situation [lucene]

2025-02-05 Thread via GitHub
rmuir commented on issue #14200: URL: https://github.com/apache/lucene/issues/14200#issuecomment-2637844948 2.33.0 is better. I will make a PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] improve the error prone situation [lucene]

2025-02-05 Thread via GitHub
rmuir commented on issue #14200: URL: https://github.com/apache/lucene/issues/14200#issuecomment-2637862845 Should be ready shortly. there are new violations, so I'm having to do some fixing. iterating with the error-prone on my dual-core 2018 computer is... painful :) -- This is an auto

Re: [I] improve the error prone situation [lucene]

2025-02-05 Thread via GitHub
rmuir commented on issue #14200: URL: https://github.com/apache/lucene/issues/14200#issuecomment-2637831373 Here is the different error that I see on the latest version: I don't understand it: ``` 2: Task failed with an exception. --- * What went wrong: Execution faile

Re: [PR] Add UnwrappingReuseStrategy for AnalyzerWrapper [lucene]

2025-02-05 Thread via GitHub
mayya-sharipova merged PR #14154: URL: https://github.com/apache/lucene/pull/14154 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lu

Re: [I] improve the error prone situation [lucene]

2025-02-05 Thread via GitHub
risdenk commented on issue #14200: URL: https://github.com/apache/lucene/issues/14200#issuecomment-2637742036 I'm not disagreeing with any of your statements about problems with errorprone, but for completeness there are much newer versions of errorprone (2.36.0 is latest https://github.com

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-05 Thread via GitHub
rmuir commented on PR #14193: URL: https://github.com/apache/lucene/pull/14193#issuecomment-2637630803 I opened https://github.com/apache/lucene/issues/14200 for the error-prone situation -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[I] improve the error prone situation [lucene]

2025-02-05 Thread via GitHub
rmuir opened a new issue, #14200: URL: https://github.com/apache/lucene/issues/14200 ### Description Currently this tool struggles to work with JDK21. maybe we can find a version that works or some workaround? It is also slow, so slow that we only run it in the CI, which isn't grea

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-05 Thread via GitHub
rmuir commented on PR #14193: URL: https://github.com/apache/lucene/pull/14193#issuecomment-2637016936 I tried to update the error prone to fix its bugs, it is angry about the way we do gradle. I will YOLO my way thru this stuff. > The default --should-stop=ifError policy (INIT) is n

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-05 Thread via GitHub
rmuir commented on PR #14193: URL: https://github.com/apache/lucene/pull/14193#issuecomment-2637219251 @dweiss the `List` is a good idea. I did the arrays only because @john-wagster has arrays over on #14192, but in the parser lists are used. The only useful thing about arrays are convenien

Re: [PR] Add UnwrappingReuseStrategy for AnalyzerWrapper [lucene]

2025-02-05 Thread via GitHub
benwtrent commented on code in PR #14154: URL: https://github.com/apache/lucene/pull/14154#discussion_r1943127032 ## lucene/core/src/java/org/apache/lucene/analysis/AnalyzerWrapper.java: ## @@ -151,4 +156,62 @@ protected final Reader initReaderForNormalization(String fieldName,

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-05 Thread via GitHub
rmuir commented on PR #14193: URL: https://github.com/apache/lucene/pull/14193#issuecomment-2637079875 I would propose replacing this checker with ast-grep rules for whatever we need: it is not a good one. the use of internal java APIs is too crazy. -- This is an automated message from th

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-05 Thread via GitHub
rmuir commented on PR #14193: URL: https://github.com/apache/lucene/pull/14193#issuecomment-2637077428 I fixed the error-prone -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-05 Thread via GitHub
rmuir commented on PR #14193: URL: https://github.com/apache/lucene/pull/14193#issuecomment-2637021676 Like i literally have no idea what this tool is trying to tell me there. But I think errorprone is broken, it depends on too many internals of the java compiler apis. -- This is an auto

Re: [I] TestSeededKnn[Byte|Float]VectorQuery.testWithTimeout failure [lucene]

2025-02-05 Thread via GitHub
benwtrent closed issue #14195: TestSeededKnn[Byte|Float]VectorQuery.testWithTimeout failure URL: https://github.com/apache/lucene/issues/14195 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [I] TestSeededKnnFloatVectorQuery.testSeedWithTimeout fails reproducibly [lucene]

2025-02-05 Thread via GitHub
benwtrent closed issue #14196: TestSeededKnnFloatVectorQuery.testSeedWithTimeout fails reproducibly URL: https://github.com/apache/lucene/issues/14196 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Correct bug with seeded vector queries with incorrect entrypoint ids [lucene]

2025-02-05 Thread via GitHub
benwtrent merged PR #14197: URL: https://github.com/apache/lucene/pull/14197 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] Use github wf to add module labels for PR based on file changes [lucene]

2025-02-05 Thread via GitHub
stefanvodita commented on PR #14101: URL: https://github.com/apache/lucene/pull/14101#issuecomment-2636455151 I've seen new PRs get labels assigned. I like how it's working! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Update package-info.java [lucene]

2025-02-05 Thread via GitHub
dweiss merged PR #14199: URL: https://github.com/apache/lucene/pull/14199 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] Update package-info.java [lucene]

2025-02-05 Thread via GitHub
dweiss commented on PR #14199: URL: https://github.com/apache/lucene/pull/14199#issuecomment-2636332925 Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

Re: [PR] Introduce bpv24 vectorized decoding for DocIdsWriter [lucene]

2025-02-05 Thread via GitHub
gf2121 commented on PR #14176: URL: https://github.com/apache/lucene/pull/14176#issuecomment-2636202013 **Some new progress** > Luceneutil now can load 3 implementors of IntersectVisitor: RangeQuery Visitor, RangeQuery InverseVisitor and DynamicPruning Visitor. Here is the result on

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-05 Thread via GitHub
rmuir commented on PR #14193: URL: https://github.com/apache/lucene/pull/14193#issuecomment-2636033017 a few more notes: * maybe we should deprecate `union(Automaton, Automaton)` and only leave `union(List)`. I see the former approach has proven trappy, let's guide developers to do it th

Re: [PR] Introduce bpv24 vectorized decoding for DocIdsWriter [lucene]

2025-02-05 Thread via GitHub
gf2121 commented on PR #14176: URL: https://github.com/apache/lucene/pull/14176#issuecomment-2636001759 Comparison of current commit(candidate) and the vectorized decoding commit(baseline). ``` TaskQPS baseline StdDevQPS my_modified_version St

Re: [PR] Add Automata.makeCharSet/makeCharClass to optimize regexp [lucene]

2025-02-05 Thread via GitHub
rmuir commented on PR #14193: URL: https://github.com/apache/lucene/pull/14193#issuecomment-2636006731 I will figure out what angers the error-prone tomorrow. I am rusty with java, so this PR needs assistance LOL. but all the tests pass. -- This is an automated message from the Apache Git