Re: [PR] Binary vector format for flat and hnsw vectors [lucene]

2025-02-07 Thread via GitHub
gaoj0017 commented on PR #14078: URL: https://github.com/apache/lucene/pull/14078#issuecomment-2644556568 After Elastic’s last round of replies, the Elastic team reached us for clarification on the issues via zoom meetings. In the meetings, they promised to fix the misattribution, so we sus

Re: [PR] [WIP] Introduce bpv24 vectorized decoding for DocIdsWriter [lucene]

2025-02-07 Thread via GitHub
gf2121 commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2644515231 On a AVX-512 Linux X86 machine: ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value

Re: [PR] Upgrade errorprone to 2.36.0 [lucene]

2025-02-07 Thread via GitHub
rmuir commented on code in PR #14216: URL: https://github.com/apache/lucene/pull/14216#discussion_r1947462840 ## gradle/validation/error-prone.gradle: ## @@ -288,8 +341,10 @@ allprojects { prj -> // '-Xep:ChainedAssertionLosesContext:OFF', // we don't use truth

Re: [PR] Error prone back from the dead [lucene]

2025-02-07 Thread via GitHub
risdenk commented on PR #14201: URL: https://github.com/apache/lucene/pull/14201#issuecomment-2644461358 Took a little longer than expected but the followup is here - https://github.com/apache/lucene/pull/14216 -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] errorprone-improvements [lucene]

2025-02-07 Thread via GitHub
risdenk commented on code in PR #14216: URL: https://github.com/apache/lucene/pull/14216#discussion_r1947419244 ## gradle/validation/error-prone.gradle: ## @@ -429,6 +508,8 @@ allprojects { prj -> '-Xep:OverrideThrowableToString:WARN', '-Xep:Overrides:W

Re: [PR] errorprone-improvements [lucene]

2025-02-07 Thread via GitHub
risdenk commented on code in PR #14216: URL: https://github.com/apache/lucene/pull/14216#discussion_r1947419192 ## gradle/validation/error-prone.gradle: ## @@ -381,43 +445,58 @@ allprojects { prj -> // '-Xep:JodaPlusMinusLong:OFF', // we don't use joda-time

Re: [PR] errorprone-improvements [lucene]

2025-02-07 Thread via GitHub
risdenk commented on code in PR #14216: URL: https://github.com/apache/lucene/pull/14216#discussion_r1947419002 ## gradle/validation/error-prone.gradle: ## @@ -65,6 +65,50 @@ allprojects { prj -> return } +// Exclude certain files (generated ones, m

Re: [PR] errorprone-improvements [lucene]

2025-02-07 Thread via GitHub
risdenk commented on code in PR #14216: URL: https://github.com/apache/lucene/pull/14216#discussion_r1947418244 ## lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseSegmentInfoFormatTestCase.java: ## @@ -391,11 +391,7 @@ public void testSort() throws IOException {

Re: [PR] supports force merge based on specified segments. [lucene]

2025-02-07 Thread via GitHub
cgejian commented on PR #14163: URL: https://github.com/apache/lucene/pull/14163#issuecomment-2644414098 > > I don't think we should merge this change, but it's good that you were able to use it to confirm that merging would reclaim these deleted docs. > > Can you add your data about this

[PR] errorprone-improvements [lucene]

2025-02-07 Thread via GitHub
risdenk opened a new pull request, #14216: URL: https://github.com/apache/lucene/pull/14216 ### Description Builds on top of https://github.com/apache/lucene/pull/14201 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2025-02-07 Thread via GitHub
jpountz commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2644276979 > The resistance to it then and still now surprises me because (at least in my mind) there's a simple selector mechanism. I agree with the value of routing to different segmen

Re: [PR] Bugfix/fix hnsw search termination check [lucene]

2025-02-07 Thread via GitHub
jimczi commented on code in PR #14215: URL: https://github.com/apache/lucene/pull/14215#discussion_r1947267621 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java: ## @@ -216,7 +216,7 @@ void searchLevel( while (candidates.size() > 0 && results.earlyT

Re: [PR] Add histogram facet capabilities. [lucene]

2025-02-07 Thread via GitHub
jpountz commented on code in PR #14204: URL: https://github.com/apache/lucene/pull/14204#discussion_r1947250254 ## lucene/facet/src/java/org/apache/lucene/facet/histogram/HistogramCollector.java: ## @@ -0,0 +1,252 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Bugfix/fix hnsw search termination check [lucene]

2025-02-07 Thread via GitHub
benwtrent commented on code in PR #14215: URL: https://github.com/apache/lucene/pull/14215#discussion_r1947228387 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java: ## @@ -216,7 +216,7 @@ void searchLevel( while (candidates.size() > 0 && results.ear

Re: [PR] Bugfix/fix hnsw search termination check [lucene]

2025-02-07 Thread via GitHub
benwtrent commented on code in PR #14215: URL: https://github.com/apache/lucene/pull/14215#discussion_r1947228387 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java: ## @@ -216,7 +216,7 @@ void searchLevel( while (candidates.size() > 0 && results.ear

Re: [PR] Bugfix/fix hnsw search termination check [lucene]

2025-02-07 Thread via GitHub
shubhamvishu commented on code in PR #14215: URL: https://github.com/apache/lucene/pull/14215#discussion_r1947219947 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java: ## @@ -216,7 +216,7 @@ void searchLevel( while (candidates.size() > 0 && results.

Re: [PR] Add histogram facet capabilities. [lucene]

2025-02-07 Thread via GitHub
jpountz commented on code in PR #14204: URL: https://github.com/apache/lucene/pull/14204#discussion_r1947210099 ## lucene/facet/src/java/org/apache/lucene/facet/histogram/HistogramCollector.java: ## @@ -0,0 +1,252 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Add histogram facet capabilities. [lucene]

2025-02-07 Thread via GitHub
jpountz commented on code in PR #14204: URL: https://github.com/apache/lucene/pull/14204#discussion_r1947210099 ## lucene/facet/src/java/org/apache/lucene/facet/histogram/HistogramCollector.java: ## @@ -0,0 +1,252 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Add histogram facet capabilities. [lucene]

2025-02-07 Thread via GitHub
jpountz commented on code in PR #14204: URL: https://github.com/apache/lucene/pull/14204#discussion_r1947204762 ## lucene/facet/src/java/org/apache/lucene/facet/histogram/HistogramCollector.java: ## @@ -0,0 +1,252 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Bugfix/fix hnsw search termination check [lucene]

2025-02-07 Thread via GitHub
jpountz commented on code in PR #14215: URL: https://github.com/apache/lucene/pull/14215#discussion_r1947201249 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java: ## @@ -216,7 +216,7 @@ void searchLevel( while (candidates.size() > 0 && results.early

Re: [I] migrate OpenNLP 'ant train-test-models' to Gradle [lucene]

2025-02-07 Thread via GitHub
dweiss closed issue #13002: migrate OpenNLP 'ant train-test-models' to Gradle URL: https://github.com/apache/lucene/issues/13002 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Migrate OpenNLP 'ant train-test-models' to Gradle [lucene]

2025-02-07 Thread via GitHub
dweiss merged PR #14198: URL: https://github.com/apache/lucene/pull/14198 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] Migrate OpenNLP 'ant train-test-models' to Gradle [lucene]

2025-02-07 Thread via GitHub
dweiss commented on PR #14198: URL: https://github.com/apache/lucene/pull/14198#issuecomment-2644041455 So these generated model files are zip files and inside is a property file, which contains a timestamp... that's why they're different (this, plus zip compression may not be predictably t

Re: [PR] Bugfix/fix hnsw search termination check [lucene]

2025-02-07 Thread via GitHub
benwtrent commented on PR #14215: URL: https://github.com/apache/lucene/pull/14215#issuecomment-2644018089 @iverase is the one who found it :D -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] SortedSet DV Multi Range query [lucene]

2025-02-07 Thread via GitHub
gsmiller commented on PR #13974: URL: https://github.com/apache/lucene/pull/13974#issuecomment-2643826042 Thanks @mkhludnev . 1. I'm not as worried about getting the API perfect with this initial commit. One benefit of doing this in `sandbox` is that we can iterate on this in breaki

Re: [PR] Add new Acorn-esque filtered HNSW search heuristic [lucene]

2025-02-07 Thread via GitHub
benwtrent commented on PR #14160: URL: https://github.com/apache/lucene/pull/14160#issuecomment-2643720205 OK, I checked the current search, and it seems to have the same issue (increasing `k` doesn't monotonically increase recall). -- This is an automated message from the Apache Git Serv

[PR] Bugfix/fix hnsw search termination check [lucene]

2025-02-07 Thread via GitHub
benwtrent opened a new pull request, #14215: URL: https://github.com/apache/lucene/pull/14215 previously related PR: https://github.com/apache/lucene/pull/12770 While my original change to help move us towards a saner HNSW search behavior, it is will still actually explore a candidate

Re: [PR] Integrating GPU based Vector Search using cuVS [lucene]

2025-02-07 Thread via GitHub
benwtrent commented on PR #14131: URL: https://github.com/apache/lucene/pull/14131#issuecomment-2643506891 I think bumping main only for each non LTS release would be cool. Then we keep it at the next LTS (Java 25)? Or, if its just as long from Lucene 10 -> 11, more likely the next LT

Re: [PR] Integrating GPU based Vector Search using cuVS [lucene]

2025-02-07 Thread via GitHub
rmuir commented on PR #14131: URL: https://github.com/apache/lucene/pull/14131#issuecomment-2643455016 > P.S. I'd like to bite into the apple and make Java 22 minimum requirement. At least for main branch what is the harm? could we do 23 or 24? I'd really like https://openjdk.org/jeps

Re: [PR] Integrating GPU based Vector Search using cuVS [lucene]

2025-02-07 Thread via GitHub
benwtrent commented on code in PR #14131: URL: https://github.com/apache/lucene/pull/14131#discussion_r1946795551 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/vectorsearch/CuVSKnnFloatVectorQuery.java: ## @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundatio

Re: [PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

2025-02-07 Thread via GitHub
rmuir commented on code in PR #14192: URL: https://github.com/apache/lucene/pull/14192#discussion_r1946735208 ## lucene/core/src/java/org/apache/lucene/util/automaton/CaseFoldingUtil.java: ## @@ -0,0 +1,338 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

Re: [PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

2025-02-07 Thread via GitHub
john-wagster commented on code in PR #14192: URL: https://github.com/apache/lucene/pull/14192#discussion_r1946628495 ## lucene/core/src/java/org/apache/lucene/util/automaton/CaseFoldingUtil.java: ## @@ -0,0 +1,338 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Move synonym map off-heap for SynonymGraphFilter [lucene]

2025-02-07 Thread via GitHub
mikemccand commented on code in PR #13054: URL: https://github.com/apache/lucene/pull/13054#discussion_r1946542491 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/synonym/SynonymMap.java: ## @@ -218,12 +231,26 @@ public void add(CharsRef input, CharsRef output, boo

Re: [PR] Deprecate Operations.concat(a1, a2) and Operations.union(a1, a2) [lucene]

2025-02-07 Thread via GitHub
rmuir merged PR #14209: URL: https://github.com/apache/lucene/pull/14209 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

Re: [I] Deprecate Operations.concatenate(a1, a2) and Operations.union(a1, a2) [lucene]

2025-02-07 Thread via GitHub
rmuir closed issue #14202: Deprecate Operations.concatenate(a1, a2) and Operations.union(a1, a2) URL: https://github.com/apache/lucene/issues/14202 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] Move synonym map off-heap for SynonymGraphFilter [lucene]

2025-02-07 Thread via GitHub
mikemccand commented on PR #13054: URL: https://github.com/apache/lucene/pull/13054#issuecomment-2642948906 Hi @msfroh -- thank you for the ping! Sorry for the slow reply ... I'll try to review again soon, and we might be able to test impact in our Amazon product search `SynonymGraphFilter

Re: [I] HNSW connect components can take an inordinate amount of time [lucene]

2025-02-07 Thread via GitHub
tteofili commented on issue #14214: URL: https://github.com/apache/lucene/issues/14214#issuecomment-2642779168 this can be reproduced with either of the following tests ```java public void testSameVectorIndexedMultipleTimes() throws IOException { try (Directory d = newDirecto

[I] HNSW connect components can take an inordinate amount of time [lucene]

2025-02-07 Thread via GitHub
benwtrent opened a new issue, #14214: URL: https://github.com/apache/lucene/issues/14214 ### Description Connect components on Flush or merge, while good for graphs that are "almost OK" but need to be better connected, can just destroy performance if the vector distribution is poor.

Re: [PR] Migrate OpenNLP 'ant train-test-models' to Gradle [lucene]

2025-02-07 Thread via GitHub
dweiss commented on code in PR #14198: URL: https://github.com/apache/lucene/pull/14198#discussion_r1946266735 ## lucene/analysis/opennlp/build.gradle: ## @@ -26,3 +26,33 @@ dependencies { moduleTestImplementation project(':lucene:test-framework') } + +ext { + testModelDa

Re: [PR] [WIP] Introduce bpv24 vectorized decoding for DocIdsWriter [lucene]

2025-02-07 Thread via GitHub
gf2121 commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2642377091 E2E result is disappointing: ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value