Re: [PR] Binary vector format for flat and hnsw vectors [lucene]

2025-03-07 Thread via GitHub
benwtrent commented on PR #14078: URL: https://github.com/apache/lucene/pull/14078#issuecomment-2706428269 @lpld I agree, both are doing similar things but there are some important distinctions. `oversample` indicates that you are going to return that ratio more results from

[PR] Optimize ConcurrentMergeScheduler for Multi-Tenant Indexing [lucene]

2025-03-07 Thread via GitHub
DivyanshIITB opened a new pull request, #14335: URL: https://github.com/apache/lucene/pull/14335 This PR enhances the ConcurrentMergeScheduler by introducing dynamic resource allocation for multi-tenant indexing scenarios. Key Improvements: 1) Global Counter for Active IndexWriters

Re: [PR] Optimize ConcurrentMergeScheduler for Multi-Tenant Indexing [lucene]

2025-03-07 Thread via GitHub
jpountz commented on PR #14335: URL: https://github.com/apache/lucene/pull/14335#issuecomment-2708071909 This looks too naive to me, we don't want each index writer to have 1/N of the resources, which would prevent one writer from maxing out resources (e.g. if one index has a heavy write lo

Re: [PR] OptimisticKnnVectorQuery [lucene]

2025-03-07 Thread via GitHub
dungba88 commented on PR #14226: URL: https://github.com/apache/lucene/pull/14226#issuecomment-2707997958 I tried the idea of stopping at second pass with the original k, but the benchmark results look weird for all algorithms, as if it doesn't matter at all. This is much different from the

Re: [PR] Create vectorized versions of ScalarQuantizer.quantize and recalculateCorrectiveOffset [lucene]

2025-03-07 Thread via GitHub
benwtrent commented on PR #14304: URL: https://github.com/apache/lucene/pull/14304#issuecomment-2706331829 > as quantization only needs to be applied to the query vector at query time, so the search speedup is noise and I should rather be looking at the indexing speedup (+2%) and merging sp

Re: [PR] A specialized Trie for Block Tree Index [lucene]

2025-03-07 Thread via GitHub
gf2121 commented on PR #14333: URL: https://github.com/apache/lucene/pull/14333#issuecomment-2706652601 All core tests passed so I mark the PR ready for review. I'll fork out a new codec and clean up the codes if this idea gets traction (current code diff is more clear for a review). --

Re: [PR] Add a Faiss codec for KNN searches [lucene]

2025-03-07 Thread via GitHub
navneet1v commented on code in PR #14178: URL: https://github.com/apache/lucene/pull/14178#discussion_r1980484722 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/faiss/FaissKnnVectorsWriter.java: ## @@ -0,0 +1,236 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] Binary vector format for flat and hnsw vectors [lucene]

2025-03-07 Thread via GitHub
lpld commented on PR #14078: URL: https://github.com/apache/lucene/pull/14078#issuecomment-2706558384 @benwtrent This makes sense, thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] knn search - add tests to perform exact search when filtering does not return enough results [lucene]

2025-03-07 Thread via GitHub
carlosdelest commented on PR #14274: URL: https://github.com/apache/lucene/pull/14274#issuecomment-2705785390 @benwtrent a review is much appreciated. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] Create vectorized versions of ScalarQuantizer.quantize and recalculateCorrectiveOffset [lucene]

2025-03-07 Thread via GitHub
benwtrent commented on code in PR #14304: URL: https://github.com/apache/lucene/pull/14304#discussion_r1985759854 ## lucene/core/src/java21/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java: ## @@ -907,4 +907,87 @@ public static long int4BitDotProduct128(byte

Re: [PR] Create vectorized versions of ScalarQuantizer.quantize and recalculateCorrectiveOffset [lucene]

2025-03-07 Thread via GitHub
benwtrent commented on PR #14304: URL: https://github.com/apache/lucene/pull/14304#issuecomment-2707531880 Ugh, my benchmark was on my laptop, which I think counts as "not having nice byte vectors". I will attempt to benchmark correctly on a cloud machine soon-ish. Sorry @jpountz @th

Re: [PR] Create vectorized versions of ScalarQuantizer.quantize and recalculateCorrectiveOffset [lucene]

2025-03-07 Thread via GitHub
benwtrent commented on code in PR #14304: URL: https://github.com/apache/lucene/pull/14304#discussion_r1985728235 ## lucene/core/src/java21/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java: ## @@ -907,4 +907,87 @@ public static long int4BitDotProduct128(byte