Re: [PR] Binary vector format for flat and hnsw vectors [lucene]

2025-01-30 Thread via GitHub
github-actions[bot] commented on PR #14078: URL: https://github.com/apache/lucene/pull/14078#issuecomment-2626006980 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [I] Allow skip_factor to be set dynamically within QueryCache [lucene]

2025-01-30 Thread via GitHub
sgup432 commented on issue #14183: URL: https://github.com/apache/lucene/issues/14183#issuecomment-2625954720 @jpountz If you think it is feasible via above approach, I can quickly raise a PR on this with some UTs. -- This is an automated message from the Apache Git Service. To respond t

Re: [I] Allow skip_factor to be set dynamically within QueryCache [lucene]

2025-01-30 Thread via GitHub
sgup432 commented on issue #14183: URL: https://github.com/apache/lucene/issues/14183#issuecomment-2625661420 @jpountz Yeah I meant we could use something like AtomicReference to set it dynamically in a thread safe way. I meant harmless in a sense that it should be easy to implement via

Re: [PR] Disable the query cache by default. [lucene]

2025-01-30 Thread via GitHub
msokolov commented on PR #14187: URL: https://github.com/apache/lucene/pull/14187#issuecomment-2625649749 I agree with the judgment, but maybe this just indicates we need to improve the cache! Still, until we can figure out how to do so, +1 to disable by default -- This is an automated me

Re: [I] Allow skip_factor to be set dynamically within QueryCache [lucene]

2025-01-30 Thread via GitHub
jpountz commented on issue #14183: URL: https://github.com/apache/lucene/issues/14183#issuecomment-2625648861 I am not entirely sure about the "harmless" part: this class is shared by multiple threads so we would need to make sure that the value is updated in a thread-safe way. -- This i

[PR] Disable the query cache by default. [lucene]

2025-01-30 Thread via GitHub
jpountz opened a new pull request, #14187: URL: https://github.com/apache/lucene/pull/14187 The query cache trades heap for faster queries. Given all the progress that has been made on making uncached queries faster (`IndexOrDocValuesQuery`, bitset encoding of blocks of postings, etc.), it'

Re: [I] Add an optional bandwidth cap to `TieredMergePolicy`? [lucene]

2025-01-30 Thread via GitHub
jpountz commented on issue #14148: URL: https://github.com/apache/lucene/issues/14148#issuecomment-2625595731 > That's the only thing that prevents MergePolicy from e.g. simply picking that merge again. I wonder if we need to prevent it from picking the same merge again. Could we wai

Re: [I] Add an optional bandwidth cap to `TieredMergePolicy`? [lucene]

2025-01-30 Thread via GitHub
mikemccand commented on issue #14148: URL: https://github.com/apache/lucene/issues/14148#issuecomment-2625515650 Oh, that's a neat idea (`MergeScheduler` being able to "cancel" merge choices from `MergePolicy`). I think we would have to register the merge, immediately, on getting it f

Re: [PR] Filtered disjunctions may miss some top hits. [lucene]

2025-01-30 Thread via GitHub
jpountz commented on PR #14186: URL: https://github.com/apache/lucene/pull/14186#issuecomment-2625505749 Interestingly, this makes queries run faster: ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value

Re: [PR] Filtered disjunctions may miss some top hits. [lucene]

2025-01-30 Thread via GitHub
jpountz commented on PR #14186: URL: https://github.com/apache/lucene/pull/14186#issuecomment-2625445431 I am annoyed that I'm not able to create a proper test for this. It seems to require very specific conditions, I found it while playing with score quantization. -- This is an automate

[PR] Filtered disjunctions may miss some top hits. [lucene]

2025-01-30 Thread via GitHub
jpountz opened a new pull request, #14186: URL: https://github.com/apache/lucene/pull/14186 This is a rare bug (for instance none of the queries in nightly benchmarks return different top hits with the fix, and I haven't been able to create a proper test) but still a bug. -- This is

Re: [PR] Add updateable random scorer interface for vector index building [lucene]

2025-01-30 Thread via GitHub
benwtrent commented on code in PR #14181: URL: https://github.com/apache/lucene/pull/14181#discussion_r1936117214 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ## @@ -296,6 +282,30 @@ to the newly introduced levels (repeating step 2,3 for new levels

Re: [PR] Add updateable random scorer interface for vector index building [lucene]

2025-01-30 Thread via GitHub
benwtrent commented on code in PR #14181: URL: https://github.com/apache/lucene/pull/14181#discussion_r1936115985 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java: ## @@ -190,12 +191,25 @@ private int getStartPos(int maxOrd) { } }

Re: [PR] Add updateable random scorer interface for vector index building [lucene]

2025-01-30 Thread via GitHub
benwtrent commented on code in PR #14181: URL: https://github.com/apache/lucene/pull/14181#discussion_r1936115985 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java: ## @@ -190,12 +191,25 @@ private int getStartPos(int maxOrd) { } }

Re: [PR] Add updateable random scorer interface for vector index building [lucene]

2025-01-30 Thread via GitHub
benwtrent commented on code in PR #14181: URL: https://github.com/apache/lucene/pull/14181#discussion_r1936115373 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java: ## @@ -190,12 +191,25 @@ private int getStartPos(int maxOrd) { } }

Re: [PR] Add updateable random scorer interface for vector index building [lucene]

2025-01-30 Thread via GitHub
benwtrent commented on code in PR #14181: URL: https://github.com/apache/lucene/pull/14181#discussion_r1936114735 ## lucene/core/src/java/org/apache/lucene/codecs/hnsw/DefaultFlatVectorScorer.java: ## @@ -90,23 +91,29 @@ public String toString() { private static final class B

Re: [PR] Add updateable random scorer interface for vector index building [lucene]

2025-01-30 Thread via GitHub
msokolov commented on code in PR #14181: URL: https://github.com/apache/lucene/pull/14181#discussion_r1935932869 ## lucene/core/src/java/org/apache/lucene/codecs/hnsw/DefaultFlatVectorScorer.java: ## @@ -90,23 +91,29 @@ public String toString() { private static final class By

Re: [PR] Fix refill logic in nextDoc(). [lucene]

2025-01-30 Thread via GitHub
jpountz commented on PR #14185: URL: https://github.com/apache/lucene/pull/14185#issuecomment-2625283827 Benchmarks suggest that the additional overhead in nextDoc() is fine: ``` TaskQPS baseline StdDevQPS my_modified_version StdDev

[PR] Fix refill logic in nextDoc(). [lucene]

2025-01-30 Thread via GitHub
jpountz opened a new pull request, #14185: URL: https://github.com/apache/lucene/pull/14185 The recent optimization from #14164 interfered in a bad way with a prior optimization. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] Not maintain docBufferUpTo when only docs needed [lucene]

2025-01-30 Thread via GitHub
jpountz commented on PR #14164: URL: https://github.com/apache/lucene/pull/14164#issuecomment-2625125069 This seems to have introduced a bug: https://ci-builds.apache.org/job/Lucene/job/Lucene-Check-main/11964/. I think it's related to how advanceShallow() sometimes refills the buffer. --

Re: [PR] Add new Acorn-esque filtered HNSW search heuristic [lucene]

2025-01-30 Thread via GitHub
benwtrent commented on PR #14160: URL: https://github.com/apache/lucene/pull/14160#issuecomment-2624958162 I ran this over the "nightly" dataset (8M 768 dim vectors). No force merging. I think this is the nightly behavior. I ran over various filter criteria (I think nightly is 5%).

Re: [PR] [Draft] Support Multi-Vector HNSW Search via Flat Vector Storage [lucene]

2025-01-30 Thread via GitHub
benwtrent commented on PR #14173: URL: https://github.com/apache/lucene/pull/14173#issuecomment-2624759420 I like where this PR is going. > Note: This change does not include dependent multi-valued vectors like ColBERT, where the multiple vectors must used together to compute similari

Re: [PR] Use github wf to add module labels for PR based on file changes [lucene]

2025-01-30 Thread via GitHub
mikemccand commented on PR #14101: URL: https://github.com/apache/lucene/pull/14101#issuecomment-2624467638 Wow, this looks really awesome! I wish we could retroactively apply it to PRs missing their module labels ... but let's start with this change (so new PRs are labeled). -- This is

Re: [PR] Add a Faiss codec for KNN searches [lucene]

2025-01-30 Thread via GitHub
kaivalnp commented on code in PR #14178: URL: https://github.com/apache/lucene/pull/14178#discussion_r1935570579 ## lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/LibFaissC.java: ## @@ -0,0 +1,268 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

Re: [PR] Use github wf to add module labels for PR based on file changes [lucene]

2025-01-30 Thread via GitHub
pseudo-nymous commented on code in PR #14101: URL: https://github.com/apache/lucene/pull/14101#discussion_r1935416351 ## .github/labeler.yml: ## @@ -0,0 +1,134 @@ +# This file defines module label mappings for the Lucene project. +# Each module is associated with a set of file g

Re: [PR] Add a Faiss codec for KNN searches [lucene]

2025-01-30 Thread via GitHub
kaivalnp commented on code in PR #14178: URL: https://github.com/apache/lucene/pull/14178#discussion_r1935407529 ## lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/LibFaissC.java: ## @@ -0,0 +1,268 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

Re: [PR] Add a Faiss codec for KNN searches [lucene]

2025-01-30 Thread via GitHub
kaivalnp commented on code in PR #14178: URL: https://github.com/apache/lucene/pull/14178#discussion_r1935304089 ## lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/LibFaissC.java: ## @@ -0,0 +1,268 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

Re: [PR] Add a Faiss codec for KNN searches [lucene]

2025-01-30 Thread via GitHub
kaivalnp commented on code in PR #14178: URL: https://github.com/apache/lucene/pull/14178#discussion_r1935303469 ## lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/FaissKnnVectorsWriter.java: ## @@ -0,0 +1,204 @@ +/* + * Licensed to the Apache Software Foundatio

Re: [PR] Use github wf to add module labels for PR based on file changes [lucene]

2025-01-30 Thread via GitHub
pseudo-nymous commented on code in PR #14101: URL: https://github.com/apache/lucene/pull/14101#discussion_r1935253167 ## .github/labeler.yml: ## @@ -0,0 +1,134 @@ +# This file defines module label mappings for the Lucene project. +# Each module is associated with a set of file g

Re: [PR] Use github wf to add module labels for PR based on file changes [lucene]

2025-01-30 Thread via GitHub
stefanvodita commented on code in PR #14101: URL: https://github.com/apache/lucene/pull/14101#discussion_r1935220020 ## .github/labeler.yml: ## @@ -0,0 +1,134 @@ +# This file defines module label mappings for the Lucene project. +# Each module is associated with a set of file gl

Re: [PR] Use github wf to add module labels for PR based on file changes [lucene]

2025-01-30 Thread via GitHub
pseudo-nymous commented on PR #14101: URL: https://github.com/apache/lucene/pull/14101#issuecomment-2623844525 > This is great @pseudo-nymous! I left minor comments that I would like to see addressed, but this is very good already. I retract my concern about the action budget. This is runni

Re: [PR] Use github wf to add module labels for PR based on file changes [lucene]

2025-01-30 Thread via GitHub
pseudo-nymous commented on PR #14101: URL: https://github.com/apache/lucene/pull/14101#issuecomment-2623829618 Did further testing with new sync flag, change is working as expected. Ref: https://github.com/pseudo-nymous/lucene/pull/3 -- This is an automated message from the Apache Git S