Re: [PR] Speedup merging of HNSW graphs [lucene]

2025-04-05 Thread via GitHub
mayya-sharipova commented on code in PR #14331: URL: https://github.com/apache/lucene/pull/14331#discussion_r2005462586 ## lucene/core/src/java/org/apache/lucene/util/hnsw/ConcurrentHnswMerger.java: ## @@ -51,19 +57,85 @@ protected HnswBuilder createBuilder(KnnVectorValues merg

Re: [PR] Speedup merging of HNSW graphs [lucene]

2025-03-27 Thread via GitHub
mayya-sharipova commented on code in PR #14331: URL: https://github.com/apache/lucene/pull/14331#discussion_r2005461835 ## lucene/core/src/java/org/apache/lucene/util/hnsw/ConcurrentHnswMerger.java: ## @@ -51,19 +57,85 @@ protected HnswBuilder createBuilder(KnnVectorValues merg

Re: [PR] Speedup merging of HNSW graphs [lucene]

2025-03-20 Thread via GitHub
mayya-sharipova merged PR #14331: URL: https://github.com/apache/lucene/pull/14331 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lu

Re: [PR] Speedup merging of HNSW graphs [lucene]

2025-03-20 Thread via GitHub
mayya-sharipova commented on code in PR #14331: URL: https://github.com/apache/lucene/pull/14331#discussion_r2005464935 ## lucene/core/src/java/org/apache/lucene/util/hnsw/MergingHnswGraphBuilder.java: ## @@ -0,0 +1,198 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] Speedup merging of HNSW graphs [lucene]

2025-03-20 Thread via GitHub
mayya-sharipova commented on code in PR #14331: URL: https://github.com/apache/lucene/pull/14331#discussion_r2005416489 ## lucene/core/src/java/org/apache/lucene/util/hnsw/ConcurrentHnswMerger.java: ## @@ -51,19 +57,85 @@ protected HnswBuilder createBuilder(KnnVectorValues merg

Re: [PR] Speedup merging of HNSW graphs [lucene]

2025-03-20 Thread via GitHub
mayya-sharipova commented on code in PR #14331: URL: https://github.com/apache/lucene/pull/14331#discussion_r2005416489 ## lucene/core/src/java/org/apache/lucene/util/hnsw/ConcurrentHnswMerger.java: ## @@ -51,19 +57,85 @@ protected HnswBuilder createBuilder(KnnVectorValues merg

Re: [PR] Speedup merging of HNSW graphs [lucene]

2025-03-19 Thread via GitHub
benwtrent commented on PR #14331: URL: https://github.com/apache/lucene/pull/14331#issuecomment-2737726576 > Experiment 3 new QSQ format: ... These improvements make sense to me. The overall bottleneck of vector ops is way lower here, so simply doing fewer ops isn't going to have a

Re: [PR] Speedup merging of HNSW graphs [lucene]

2025-03-19 Thread via GitHub
benwtrent commented on code in PR #14331: URL: https://github.com/apache/lucene/pull/14331#discussion_r2003974286 ## lucene/core/src/java/org/apache/lucene/util/hnsw/ConcurrentHnswMerger.java: ## @@ -51,19 +57,85 @@ protected HnswBuilder createBuilder(KnnVectorValues mergedVect

Re: [PR] Speedup merging of HNSW graphs [lucene]

2025-03-19 Thread via GitHub
mayya-sharipova commented on PR #14331: URL: https://github.com/apache/lucene/pull/14331#issuecomment-2737598747 I've done additional benchmarks with the new Optimized Scalar Quantization format that quantize 32x times to 1 single bit (Lucene102HnswBinaryQuantizedVectorsFormat). And here we

Re: [PR] Speedup merging of HNSW graphs [lucene]

2025-03-19 Thread via GitHub
msokolov commented on PR #14331: URL: https://github.com/apache/lucene/pull/14331#issuecomment-2737095119 yes, looks good, I think this is the right tradeoff. We even seem to get improved query performance in some cases. +1 to merge this -- This is an automated message from the Apache Git

Re: [PR] Speedup merging of HNSW graphs [lucene]

2025-03-17 Thread via GitHub
mayya-sharipova commented on PR #14331: URL: https://github.com/apache/lucene/pull/14331#issuecomment-2730506974 @msokolov Thanks for the comment. I've experimented setting: beamCandidates0 to `M * 3` increasing it from the previous `M*2` when building merged graphs. Graphs look bette

Re: [PR] Speedup merging of HNSW graphs [lucene]

2025-03-05 Thread via GitHub
msokolov commented on PR #14331: URL: https://github.com/apache/lucene/pull/14331#issuecomment-2701861504 oh, this is a neat idea! Looks like we sacrifice some query performance (in some cases) for a big improvement in indexing time. I wonder if we've tried other values of `beamWidth` to se

Re: [PR] Speedup merging of HNSW graphs [lucene]

2025-03-05 Thread via GitHub
mayya-sharipova commented on PR #14331: URL: https://github.com/apache/lucene/pull/14331#issuecomment-2701756507 Evaluation is done with Luceneutil on these datasets: Rebased against Lucene main branch: 1. **quora-E5-small**; 522931 docs; 384 dims; 7 bits quantized; cosine metri