Re: [PR] Binary vector format for flat and hnsw vectors [lucene]

2025-03-10 Thread via GitHub
mikemccand commented on PR #14078: URL: https://github.com/apache/lucene/pull/14078#issuecomment-2710482360 > @lpld quantizing is done per segment, at flush and merge time. So it takes into account live vectors in the segment during flush and merge. > > I don't see why adding/updating

Re: [PR] Make Lucene better at skipping long runs of matches. [lucene]

2025-03-10 Thread via GitHub
jpountz commented on code in PR #14312: URL: https://github.com/apache/lucene/pull/14312#discussion_r1987222711 ## lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java: ## @@ -117,6 +107,65 @@ private static int advance(FixedBitSet set, int i) { }

Re: [PR] A specialized Trie for Block Tree Index [lucene]

2025-03-10 Thread via GitHub
jpountz commented on code in PR #14333: URL: https://github.com/apache/lucene/pull/14333#discussion_r1987259134 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ## @@ -857,123 +768,126 @@ public SeekStatus seekCeil(BytesRef target) throw

Re: [PR] Break the loop when segment is fully deleted by prior delTerms or delQueries [lucene]

2025-03-10 Thread via GitHub
vsop-479 commented on PR #13398: URL: https://github.com/apache/lucene/pull/13398#issuecomment-2712716395 @mikemccand @jpountz Please take a look when you get a chance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] Make Lucene better at skipping long runs of matches. [lucene]

2025-03-10 Thread via GitHub
gf2121 commented on code in PR #14312: URL: https://github.com/apache/lucene/pull/14312#discussion_r1986758277 ## lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java: ## @@ -128,6 +128,16 @@ private void scoreWindowUsingBitSet( assert windowMatches

Re: [PR] Make Lucene better at skipping long runs of matches. [lucene]

2025-03-10 Thread via GitHub
gf2121 commented on code in PR #14312: URL: https://github.com/apache/lucene/pull/14312#discussion_r1986764581 ## lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java: ## @@ -117,6 +107,65 @@ private static int advance(FixedBitSet set, int i) { }

Re: [PR] Binary vector format for flat and hnsw vectors [lucene]

2025-03-10 Thread via GitHub
benwtrent commented on PR #14078: URL: https://github.com/apache/lucene/pull/14078#issuecomment-2710314969 @lpld quantizing is done per segment, at flush and merge time. So it takes into account live vectors in the segment during flush and merge. I don't see why adding/updating/deleti

Re: [PR] Add a Faiss codec for KNN searches [lucene]

2025-03-10 Thread via GitHub
kaivalnp commented on PR #14178: URL: https://github.com/apache/lucene/pull/14178#issuecomment-2710218340 Thanks @benwtrent! > While I think the performance numbers are cool, they indicate that this doesn't actually buy us that much The speedup we see above is just a pure HNSW

Re: [PR] OptimisticKnnVectorQuery [lucene]

2025-03-10 Thread via GitHub
msokolov commented on PR #14226: URL: https://github.com/apache/lucene/pull/14226#issuecomment-2710392951 I think the simplified version makes the most sense -- Just confirming: this is where we first search pro-rated across segments, and then if any segment has its output queue full with c

Re: [PR] Create vectorized versions of ScalarQuantizer.quantize and recalculateCorrectiveOffset [lucene]

2025-03-10 Thread via GitHub
thecoop commented on code in PR #14304: URL: https://github.com/apache/lucene/pull/14304#discussion_r1986956051 ## lucene/core/src/java21/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java: ## @@ -907,4 +907,87 @@ public static long int4BitDotProduct128(byte[]

Re: [PR] Create vectorized versions of ScalarQuantizer.quantize and recalculateCorrectiveOffset [lucene]

2025-03-10 Thread via GitHub
thecoop commented on code in PR #14304: URL: https://github.com/apache/lucene/pull/14304#discussion_r1986956051 ## lucene/core/src/java21/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java: ## @@ -907,4 +907,87 @@ public static long int4BitDotProduct128(byte[]

Re: [PR] OptimisticKnnVectorQuery [lucene]

2025-03-10 Thread via GitHub
dungba88 commented on PR #14226: URL: https://github.com/apache/lucene/pull/14226#issuecomment-2710421110 > this is where we first search pro-rated across segments, and then if any segment has its output queue full with competitive hits, we revisit it using the prior hits as entry points an

Re: [PR] Create vectorized versions of ScalarQuantizer.quantize and recalculateCorrectiveOffset [lucene]

2025-03-10 Thread via GitHub
thecoop commented on code in PR #14304: URL: https://github.com/apache/lucene/pull/14304#discussion_r1986971370 ## lucene/core/src/java21/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java: ## @@ -907,4 +907,87 @@ public static long int4BitDotProduct128(byte[]

Re: [PR] OptimisticKnnVectorQuery [lucene]

2025-03-10 Thread via GitHub
dungba88 commented on PR #14226: URL: https://github.com/apache/lucene/pull/14226#issuecomment-2709963838 Ran again with the second idea that the pro-rata rate will be based on the active segments in the current pass instead of the whole index (called "adaptive" in the graph). However I did

Re: [PR] Make single value BKDReader instances lighter [lucene]

2025-03-10 Thread via GitHub
original-brownbear commented on PR #14337: URL: https://github.com/apache/lucene/pull/14337#issuecomment-2710427919 Thanks Ignacio! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co