Re: [PR] Add Query for reranking KnnFloatVectorQuery with full-precision vectors [lucene]

2025-06-12 Thread via GitHub
dungba88 commented on code in PR #14009: URL: https://github.com/apache/lucene/pull/14009#discussion_r2144335229 ## lucene/core/src/java/org/apache/lucene/search/RescoreTopNQuery.java: ## @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or mor

Re: [PR] Add Query for reranking KnnFloatVectorQuery with full-precision vectors [lucene]

2025-06-12 Thread via GitHub
dungba88 commented on code in PR #14009: URL: https://github.com/apache/lucene/pull/14009#discussion_r2144335229 ## lucene/core/src/java/org/apache/lucene/search/RescoreTopNQuery.java: ## @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or mor

Re: [PR] Add Query for reranking KnnFloatVectorQuery with full-precision vectors [lucene]

2025-06-12 Thread via GitHub
dungba88 commented on code in PR #14009: URL: https://github.com/apache/lucene/pull/14009#discussion_r2144333019 ## lucene/CHANGES.txt: ## @@ -2453,7 +2454,7 @@ New Features * LUCENE-10385: Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery to speed up comp

Re: [PR] Add a rescorer that uses DoubleValuesSource values to re-score first pass hits [lucene]

2025-06-12 Thread via GitHub
dungba88 commented on code in PR #14776: URL: https://github.com/apache/lucene/pull/14776#discussion_r2144150610 ## lucene/core/src/java/org/apache/lucene/search/DoubleValuesSourceRescorer.java: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[I] Only valid `collectors` should be taken into account when computing the `scoreMode` in `MultiCollector` [lucene]

2025-06-12 Thread via GitHub
kkewwei opened a new issue, #14778: URL: https://github.com/apache/lucene/issues/14778 ### Description During the retrieval of collectors via `collector.getLeafCollector(context)`, some invocations may throw a CollectionTerminatedException-indicating this leaf collector does not need

Re: [I] Only valid `collectors` should be taken into account when computing the `scoreMode` in `MultiCollector` [lucene]

2025-06-12 Thread via GitHub
kkewwei closed issue #14778: Only valid `collectors` should be taken into account when computing the `scoreMode` in `MultiCollector` URL: https://github.com/apache/lucene/issues/14778 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] Add a rescorer that uses DoubleValuesSource values to re-score first pass hits [lucene]

2025-06-12 Thread via GitHub
vigyasharma commented on code in PR #14776: URL: https://github.com/apache/lucene/pull/14776#discussion_r2144284445 ## lucene/core/src/java/org/apache/lucene/search/DoubleValuesSourceRescorer.java: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

Re: [PR] Add a rescorer that uses DoubleValuesSource values to re-score first pass hits [lucene]

2025-06-12 Thread via GitHub
vigyasharma commented on code in PR #14776: URL: https://github.com/apache/lucene/pull/14776#discussion_r2144287000 ## lucene/core/src/java/org/apache/lucene/search/DoubleValuesSourceRescorer.java: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

Re: [PR] No ruff violation [lucene]

2025-06-12 Thread via GitHub
dweiss merged PR #14725: URL: https://github.com/apache/lucene/pull/14725 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] Add Query for reranking KnnFloatVectorQuery with full-precision vectors [lucene]

2025-06-12 Thread via GitHub
vigyasharma commented on code in PR #14009: URL: https://github.com/apache/lucene/pull/14009#discussion_r2144293225 ## lucene/CHANGES.txt: ## @@ -2453,7 +2454,7 @@ New Features * LUCENE-10385: Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery to speed up c

Re: [PR] Add Query for reranking KnnFloatVectorQuery with full-precision vectors [lucene]

2025-06-12 Thread via GitHub
vigyasharma commented on code in PR #14009: URL: https://github.com/apache/lucene/pull/14009#discussion_r2144296041 ## lucene/core/src/java/org/apache/lucene/search/RescoreTopNQuery.java: ## @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

Re: [PR] Use IntArrayList/IntHashSet to replace usages of List/Set of Integer [lucene]

2025-06-12 Thread via GitHub
github-actions[bot] commented on PR #14774: URL: https://github.com/apache/lucene/pull/14774#issuecomment-2966694414 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

[PR] Add prefetching for terms dict in doc values [lucene]

2025-06-12 Thread via GitHub
easyice opened a new pull request, #14773: URL: https://github.com/apache/lucene/pull/14773 This follows a similar approach as doc values and only prefetches the first page of data. Perhaps these were missed at the time? -- This is an automated message from the Apache Git Service. To res

[PR] Use IntArrayList/IntHashSet to replace usages of List/Set of Integer [lucene]

2025-06-12 Thread via GitHub
easyice opened a new pull request, #14774: URL: https://github.com/apache/lucene/pull/14774 No functional changes — only optimization to reduce auto-boxing. However, this involves a public API change in `UpdateGraphsUtils#computeJoinSet`. not sure whether we should touch this public API?

Re: [PR] Build refactoring and cleanups (moving from build scripts to convention plugins) [lucene]

2025-06-12 Thread via GitHub
breskeby commented on code in PR #14764: URL: https://github.com/apache/lucene/pull/14764#discussion_r2142088147 ## build-tools/build-infra/src/main/groovy/lucene.datasets.external-datasets.gradle: ## @@ -17,41 +15,40 @@ import java.nio.file.Files * limitations under the Licen

Re: [PR] .editorconfig [lucene]

2025-06-12 Thread via GitHub
dsmiley commented on PR #14740: URL: https://github.com/apache/lucene/pull/14740#issuecomment-2964784639 Are there any concerns with merging this in its current state? It's trimmed down from the original. -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] Implement `ConstantScoreScorer#nextDocsAndScores` [lucene]

2025-06-12 Thread via GitHub
HUSTERGS commented on PR #14772: URL: https://github.com/apache/lucene/pull/14772#issuecomment-2966404955 Luceneutil benchmark result with taskCountPerCat=5: ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value Interva

Re: [PR] Build refactoring and cleanups (moving from build scripts to convention plugins) [lucene]

2025-06-12 Thread via GitHub
breskeby commented on code in PR #14764: URL: https://github.com/apache/lucene/pull/14764#discussion_r2142099569 ## build-tools/build-infra/src/main/groovy/lucene.documentation.check-broken-links.gradle: ## @@ -15,20 +15,22 @@ * limitations under the License. */ -def resou

Re: [PR] Use multi-select instead of a full sort for DynamicRange creation [lucene]

2025-06-12 Thread via GitHub
github-actions[bot] commented on PR #13914: URL: https://github.com/apache/lucene/pull/13914#issuecomment-2967771248 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

Re: [PR] .editorconfig [lucene]

2025-06-12 Thread via GitHub
dsmiley commented on PR #14740: URL: https://github.com/apache/lucene/pull/14740#issuecomment-2967669282 Yes; _most_ lines are IntelliJ specific. Thanks for pointing to your recent build work... I'll want to adjust this this configuration to align with the Groovy style from there. -- Th

Re: [I] Integrate a JVector codec for KNN searches [lucene]

2025-06-12 Thread via GitHub
sam-herman commented on issue #14681: URL: https://github.com/apache/lucene/issues/14681#issuecomment-2967720821 The biggest roadblock to integrating properly with Lucene is that jVector throughout relies on a `RandomWriter` that can seek backwards. This is not compatible with Lucene's appe

Re: [PR] Build refactoring and cleanups (moving from build scripts to convention plugins) [lucene]

2025-06-12 Thread via GitHub
dsmiley commented on code in PR #14764: URL: https://github.com/apache/lucene/pull/14764#discussion_r2143115906 ## build-tools/build-infra/src/main/groovy/lucene.java-projects.conventions.gradle: ## @@ -15,15 +15,20 @@ * limitations under the License. */ -// Configure rele

Re: [PR] Build refactoring and cleanups (moving from build scripts to convention plugins) [lucene]

2025-06-12 Thread via GitHub
dsmiley commented on code in PR #14764: URL: https://github.com/apache/lucene/pull/14764#discussion_r2143110804 ## build-tools/build-infra/src/main/groovy/lucene.datasets.external-datasets.gradle: ## @@ -162,29 +167,31 @@ configure(project(":lucene:benchmark")) { logger

Re: [PR] Build refactoring and cleanups (moving from build scripts to convention plugins) [lucene]

2025-06-12 Thread via GitHub
dweiss commented on code in PR #14764: URL: https://github.com/apache/lucene/pull/14764#discussion_r2143551451 ## build-tools/build-infra/src/main/groovy/lucene.datasets.external-datasets.gradle: ## @@ -17,41 +15,40 @@ import java.nio.file.Files * limitations under the License

[PR] Add the ability to inverse a Sort [lucene]

2025-06-12 Thread via GitHub
HoustonPutman opened a new pull request, #14775: URL: https://github.com/apache/lucene/pull/14775 ### Description Currently there is no easy way to reverse a Sort unless you know how that sort was created. Adding an `inverse` option gives users an easy way to reverse a given sort.

Re: [PR] Add the ability to inverse a Sort [lucene]

2025-06-12 Thread via GitHub
github-actions[bot] commented on PR #14775: URL: https://github.com/apache/lucene/pull/14775#issuecomment-2968021725 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

Re: [PR] Build refactoring and cleanups (moving from build scripts to convention plugins) [lucene]

2025-06-12 Thread via GitHub
dsmiley commented on code in PR #14764: URL: https://github.com/apache/lucene/pull/14764#discussion_r2143107306 ## build-tools/build-infra/src/main/groovy/lucene.datasets.external-datasets.gradle: ## @@ -17,41 +15,40 @@ import java.nio.file.Files * limitations under the Licens

[PR] Add a rescorer that uses DoubleValuesSource values to re-score first pass hits [lucene]

2025-06-12 Thread via GitHub
vigyasharma opened a new pull request, #14776: URL: https://github.com/apache/lucene/pull/14776 A `Rescorer` that uses values from provided `DoubleValuesSource` to re-score top N hits of a query. Enables us to rescore hits from ANN vector queries using full-precision or late-interact

Re: [PR] Add a rescorer that uses DoubleValuesSource values to re-score first pass hits [lucene]

2025-06-12 Thread via GitHub
github-actions[bot] commented on PR #14776: URL: https://github.com/apache/lucene/pull/14776#issuecomment-2968083964 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

Re: [PR] Support for Re-Ranking Queries using Late Interaction Model Multi-Vectors. [lucene]

2025-06-12 Thread via GitHub
vigyasharma commented on PR #14729: URL: https://github.com/apache/lucene/pull/14729#issuecomment-2968102601 > The advantage of a `Rescorer` is that is is explicitly only run over the hits in a `TopDocs` instance, whereas `FunctionScoreQuery` will run over the entire docid space if you let

Re: [PR] No ruff violation [lucene]

2025-06-12 Thread via GitHub
github-actions[bot] commented on PR #14725: URL: https://github.com/apache/lucene/pull/14725#issuecomment-2968593315 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Optimize filtered kNN search with FixedBitSet intersections [lucene]

2025-06-12 Thread via GitHub
github-actions[bot] commented on PR #14771: URL: https://github.com/apache/lucene/pull/14771#issuecomment-2968645559 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

Re: [PR] Add Query for reranking KnnFloatVectorQuery with full-precision vectors [lucene]

2025-06-12 Thread via GitHub
github-actions[bot] commented on PR #14009: URL: https://github.com/apache/lucene/pull/14009#issuecomment-2968664265 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

Re: [PR] Add Query for reranking KnnFloatVectorQuery with full-precision vectors [lucene]

2025-06-12 Thread via GitHub
github-actions[bot] commented on PR #14009: URL: https://github.com/apache/lucene/pull/14009#issuecomment-2968670768 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

Re: [PR] Add a DoubleValuesSource for scoring full precision vector similarity [lucene]

2025-06-12 Thread via GitHub
dungba88 commented on code in PR #14708: URL: https://github.com/apache/lucene/pull/14708#discussion_r2144021994 ## lucene/core/src/test/org/apache/lucene/search/TestQuantizedVectorSimilarityValueSource.java: ## @@ -0,0 +1,217 @@ +/* + * Licensed to the Apache Software Foundatio

Re: [PR] Add Query for reranking KnnFloatVectorQuery with full-precision vectors [lucene]

2025-06-12 Thread via GitHub
dungba88 commented on PR #14009: URL: https://github.com/apache/lucene/pull/14009#issuecomment-2968701047 I published another revision to support a custom DoubleValueSource for re-ranking, and we can reuse the `FloatVectorSimilarityValuesSource` in case we want to rescore with other field,

[PR] Add list initial capacity in FirstPassGroupingCollector#getTopGroups. [lucene]

2025-06-12 Thread via GitHub
vsop-479 opened a new pull request, #14777: URL: https://github.com/apache/lucene/pull/14777 ### Description -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Add Query for reranking KnnFloatVectorQuery with full-precision vectors [lucene]

2025-06-12 Thread via GitHub
dungba88 commented on code in PR #14009: URL: https://github.com/apache/lucene/pull/14009#discussion_r2144042904 ## lucene/CHANGES.txt: ## @@ -857,7 +858,7 @@ Improvements * GITHUB#13285: Early terminate graph searches of AbstractVectorSimilarityQuery to follow timeout set f

Re: [PR] Add list initial capacity in FirstPassGroupingCollector#getTopGroups. [lucene]

2025-06-12 Thread via GitHub
github-actions[bot] commented on PR #14777: URL: https://github.com/apache/lucene/pull/14777#issuecomment-2968726519 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

Re: [I] Why doesn't RRF handle tied scores with equal ranking instead of using positional ranking? [lucene]

2025-06-12 Thread via GitHub
hellosunil commented on issue #14769: URL: https://github.com/apache/lucene/issues/14769#issuecomment-2964700843 > I'm happy to see this API being used as it was only added in the last minor release. The change that you are suggested makes sense to me. I'd like to then use doc IDs as a tie-

Re: [PR] .editorconfig [lucene]

2025-06-12 Thread via GitHub
dweiss commented on PR #14740: URL: https://github.com/apache/lucene/pull/14740#issuecomment-2965457564 Most of those settings apply to intellij, right? They are not some "standard" .editorconfig things. I'm fine with adding this - if something clashes with other settings, we can always twe

Re: [PR] Improve hnsw on heap ram est [lucene]

2025-06-12 Thread via GitHub
weizijun commented on PR #14765: URL: https://github.com/apache/lucene/pull/14765#issuecomment-2965581181 @benwtrent Thanks for helping improve `OnHeapGraph#graphRamBytesUsed`! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] Add prefetching for terms dict in doc values [lucene]

2025-06-12 Thread via GitHub
github-actions[bot] commented on PR #14773: URL: https://github.com/apache/lucene/pull/14773#issuecomment-2966493936 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

Re: [PR] Build refactoring and cleanups (moving from build scripts to convention plugins) [lucene]

2025-06-12 Thread via GitHub
breskeby commented on code in PR #14764: URL: https://github.com/apache/lucene/pull/14764#discussion_r2142109839 ## build-tools/build-infra/src/main/groovy/lucene.ide.intellij-idea.gradle: ## @@ -37,12 +37,12 @@ if (isIdea) { } } -if (isIdeaSync) { +if (rootProject.ext.isI

Re: [PR] Optimize filtered kNN search with FixedBitSet intersections [lucene]

2025-06-12 Thread via GitHub
github-actions[bot] commented on PR #14771: URL: https://github.com/apache/lucene/pull/14771#issuecomment-2965448645 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

Re: [PR] Build refactoring and cleanups (moving from build scripts to convention plugins) [lucene]

2025-06-12 Thread via GitHub
dweiss commented on code in PR #14764: URL: https://github.com/apache/lucene/pull/14764#discussion_r2142114811 ## build-tools/build-infra/src/main/groovy/lucene.datasets.external-datasets.gradle: ## @@ -17,41 +15,40 @@ import java.nio.file.Files * limitations under the License

Re: [PR] Build refactoring and cleanups (moving from build scripts to convention plugins) [lucene]

2025-06-12 Thread via GitHub
dweiss commented on PR #14764: URL: https://github.com/apache/lucene/pull/14764#issuecomment-2965807030 > Hope you don't mind me chiming in as this PR catched my attention. I'm Rene and maintain the elasticsearch gradle build and have been involved with Gradle directly since 0.4. I added a

Re: [PR] Build refactoring and cleanups (moving from build scripts to convention plugins) [lucene]

2025-06-12 Thread via GitHub
breskeby commented on code in PR #14764: URL: https://github.com/apache/lucene/pull/14764#discussion_r2142099569 ## build-tools/build-infra/src/main/groovy/lucene.documentation.check-broken-links.gradle: ## @@ -15,20 +15,22 @@ * limitations under the License. */ -def resou

Re: [PR] Improve hnsw on heap ram est [lucene]

2025-06-12 Thread via GitHub
dweiss commented on code in PR #14765: URL: https://github.com/apache/lucene/pull/14765#discussion_r2142163719 ## lucene/core/src/test/org/apache/lucene/util/hnsw/HnswGraphTestCase.java: ## @@ -786,6 +786,7 @@ public void testHnswGraphBuilderInvalid() throws IOException {

Re: [PR] Build refactoring and cleanups (moving from build scripts to convention plugins) [lucene]

2025-06-12 Thread via GitHub
breskeby commented on code in PR #14764: URL: https://github.com/apache/lucene/pull/14764#discussion_r2142075633 ## build-tools/build-infra-shadow/build.gradle: ## @@ -0,0 +1,79 @@ +import com.diffplug.gradle.spotless.SpotlessTask + +/* + * Licensed to the Apache Software Founda

Re: [PR] Implement `ConstantScoreScorer#nextDocsAndScores` [lucene]

2025-06-12 Thread via GitHub
github-actions[bot] commented on PR #14772: URL: https://github.com/apache/lucene/pull/14772#issuecomment-2966391841 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

Re: [PR] Build refactoring and cleanups (moving from build scripts to convention plugins) [lucene]

2025-06-12 Thread via GitHub
breskeby commented on code in PR #14764: URL: https://github.com/apache/lucene/pull/14764#discussion_r2142114663 ## build-tools/build-infra/src/main/groovy/lucene.java-projects.conventions.gradle: ## @@ -15,15 +15,20 @@ * limitations under the License. */ -// Configure rel

[PR] Implement `ConstantScoreScorer#nextDocsAndScores` [lucene]

2025-06-12 Thread via GitHub
HUSTERGS opened a new pull request, #14772: URL: https://github.com/apache/lucene/pull/14772 ### Description This try to implement `ConstantScoreScorer#nextDocsAndScores` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] Build refactoring and cleanups (moving from build scripts to convention plugins) [lucene]

2025-06-12 Thread via GitHub
breskeby commented on code in PR #14764: URL: https://github.com/apache/lucene/pull/14764#discussion_r2142074783 ## build-tools/build-infra-shadow/build.gradle: ## @@ -0,0 +1,79 @@ +import com.diffplug.gradle.spotless.SpotlessTask + +/* + * Licensed to the Apache Software Founda

Re: [PR] Build refactoring and cleanups (moving from build scripts to convention plugins) [lucene]

2025-06-12 Thread via GitHub
breskeby commented on code in PR #14764: URL: https://github.com/apache/lucene/pull/14764#discussion_r2142094953 ## build-tools/build-infra/src/main/groovy/lucene.datasets.external-datasets.gradle: ## @@ -162,29 +167,31 @@ configure(project(":lucene:benchmark")) { logge

Re: [PR] Build refactoring and cleanups (moving from build scripts to convention plugins) [lucene]

2025-06-12 Thread via GitHub
dweiss commented on code in PR #14764: URL: https://github.com/apache/lucene/pull/14764#discussion_r2142111750 ## build-tools/build-infra-shadow/build.gradle: ## @@ -0,0 +1,79 @@ +import com.diffplug.gradle.spotless.SpotlessTask + +/* + * Licensed to the Apache Software Foundati

Re: [PR] Add Query for reranking KnnFloatVectorQuery with full-precision vectors [lucene]

2025-06-12 Thread via GitHub
dungba88 commented on PR #14009: URL: https://github.com/apache/lucene/pull/14009#issuecomment-2965751382 I'm thinking that instead of using `vectorValues(...)` we can probably use scorer from another (vector) field for re-ranking. This could help with the use case to e.g use 1-bit quantiza

Re: [PR] Build refactoring and cleanups (moving from build scripts to convention plugins) [lucene]

2025-06-12 Thread via GitHub
dweiss commented on code in PR #14764: URL: https://github.com/apache/lucene/pull/14764#discussion_r2142130077 ## build-tools/build-infra/src/main/groovy/lucene.datasets.external-datasets.gradle: ## @@ -162,29 +167,31 @@ configure(project(":lucene:benchmark")) { logger.

[PR] Optimize filtered kNN search with FixedBitSet intersections [lucene]

2025-06-12 Thread via GitHub
mrkm4ntr opened a new pull request, #14771: URL: https://github.com/apache/lucene/pull/14771 ### Description If a filter query in kNN is in query cache, it can be saved as a `BitSetIterator` wrapping a `FixedBitSet`. But we are creating new `FixedBitSet` combined with liveDocs for every

Re: [PR] Build refactoring and cleanups (moving from build scripts to convention plugins) [lucene]

2025-06-12 Thread via GitHub
github-actions[bot] commented on PR #14764: URL: https://github.com/apache/lucene/pull/14764#issuecomment-2965485569 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop