Re: [PR] Fix for changelog verifier and milestone setter automation [lucene]

2025-03-19 Thread via GitHub
pseudo-nymous commented on PR #14369: URL: https://github.com/apache/lucene/pull/14369#issuecomment-2739129987 Moved it to draft state to address all the failures first. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[PR] Remove nonexistent PackedBlockLength reference in document [lucene]

2025-03-19 Thread via GitHub
amosbird opened a new pull request, #14377: URL: https://github.com/apache/lucene/pull/14377 ### Description Remove nonexistent `PackedBlockLength` reference in document. This seems to be a documentation artifact from version 912 onward, with no corresponding implementation found in

Re: [PR] Add Issue Tracker Link under 'Editing Content on the Lucene™ Sites' [lucene-site]

2025-03-19 Thread via GitHub
DivyanshIITB commented on code in PR #78: URL: https://github.com/apache/lucene-site/pull/78#discussion_r2003517630 ## content/pages/site-instructions.md: ## @@ -3,8 +3,10 @@ URL: site-instructions.html save_as: site-instructions.html template: lucene/tlp/page + ## Editing

[I] Handling concurrent search in QueryProfiler [lucene]

2025-03-19 Thread via GitHub
jainankitk opened a new issue, #14375: URL: https://github.com/apache/lucene/issues/14375 ### Description Based on the discussion from [this email thread](https://lists.apache.org/thread.html/r7957a2d9ca38af45b1c370753b3c10542fd9faaf9bf95944c5224e12%40%3Cdev.lucene.apache.org%3E), ht

Re: [PR] Add Issue Tracker Link under 'Editing Content on the Lucene™ Sites' [lucene-site]

2025-03-19 Thread via GitHub
dweiss merged PR #78: URL: https://github.com/apache/lucene-site/pull/78 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

Re: [PR] Implement bulk adding methods for dynamic pruning. [lucene]

2025-03-19 Thread via GitHub
gf2121 merged PR #14365: URL: https://github.com/apache/lucene/pull/14365 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

[PR] Add leafReaders() Method to IndexReader and Unit Test [lucene]

2025-03-19 Thread via GitHub
DivyanshIITB opened a new pull request, #14370: URL: https://github.com/apache/lucene/pull/14370 This PR introduces leafReaders() in IndexReader for direct access to LeafReader instances, improving usability over leaves(). A corresponding unit test ensures correctness by validating retrieva

Re: [PR] Implement bulk adding methods for dynamic pruning. [lucene]

2025-03-19 Thread via GitHub
gf2121 commented on PR #14365: URL: https://github.com/apache/lucene/pull/14365#issuecomment-2736218538 I run some benchmarks to find out the major reason: **Baseline**: main branch **Candidate**: collecting docs greater than maxDocVisited into bitset (instead of `DocIdSetBuilder

Re: [PR] Fix for changelog verifier and milestone setter automation [lucene]

2025-03-19 Thread via GitHub
pseudo-nymous commented on PR #14369: URL: https://github.com/apache/lucene/pull/14369#issuecomment-2736166568 Yes, fix here would fix the `fatal: bad object` failue. I haven't seen the first failure before, let me address this. -- This is an automated message from the Apache Git Service.

Re: [PR] Completion FSTs to be loaded off-heap at all times [lucene]

2025-03-19 Thread via GitHub
javanna commented on PR #14364: URL: https://github.com/apache/lucene/pull/14364#issuecomment-2736271001 Cool then I will target this PR at main only, and open a separate PR for `branch_10x`. Out of curiosity, what are the usecases where you'd expect users to call `NRTSuggester#load` direct

Re: [PR] Optimize ConcurrentMergeScheduler for Multi-Tenant Indexing [lucene]

2025-03-19 Thread via GitHub
DivyanshIITB commented on PR #14335: URL: https://github.com/apache/lucene/pull/14335#issuecomment-2735879968 Thankyou for your help @vigyasharma ! I would love to explore small and more focused issues to start with ! -- This is an automated message from the Apache Git Service. To respo

Re: [PR] Completion FSTs to be loaded off-heap at all times [lucene]

2025-03-19 Thread via GitHub
jpountz commented on PR #14364: URL: https://github.com/apache/lucene/pull/14364#issuecomment-2735857094 I was thinking of keeping the `load(IndexInput, FSTLoadMode)` static method, documenting that the load mode is ignored and deprecating it. Indeed that would require keeping the `FSTLoadM

Re: [PR] Completion FSTs to be loaded off-heap at all times [lucene]

2025-03-19 Thread via GitHub
javanna commented on PR #14364: URL: https://github.com/apache/lucene/pull/14364#issuecomment-2735835683 Thanks @jpountz what's your suggestion around back-compat? Sounds like you are suggesting not backporting the removal of the fst load mode enum but only the switch to off-heap by default

Re: [PR] Fix for changelog verifier and milestone setter automation [lucene]

2025-03-19 Thread via GitHub
stefanvodita commented on PR #14369: URL: https://github.com/apache/lucene/pull/14369#issuecomment-2735852996 I'm happy to try this out, but I have some doubts it addresses the issue we're experiencing now. For example, take the failure [here](https://github.com/apache/lucene/actions/runs/1

[PR] Adjust visibility of NRTSuggester#load [lucene]

2025-03-19 Thread via GitHub
javanna opened a new pull request, #14372: URL: https://github.com/apache/lucene/pull/14372 load is a public static method, but its corresponding builder NRTSuggesterBuilder is package private. That means that there is no reason for load to be public. -- This is an automated message

Re: [I] Add issue tracker for website [lucene-site]

2025-03-19 Thread via GitHub
dweiss closed issue #72: Add issue tracker for website URL: https://github.com/apache/lucene-site/issues/72 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-ma

Re: [PR] Add Issue Tracker Link under 'Editing Content on the Lucene™ Sites' [lucene-site]

2025-03-19 Thread via GitHub
DivyanshIITB commented on PR #78: URL: https://github.com/apache/lucene-site/pull/78#issuecomment-2737508089 Just a gentle reminder @dweiss -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Speedup merging of HNSW graphs [lucene]

2025-03-19 Thread via GitHub
mayya-sharipova commented on PR #14331: URL: https://github.com/apache/lucene/pull/14331#issuecomment-2737598747 I've done additional benchmarks with the new Optimized Scalar Quantization format that quantize 32x times to 1 single bit (Lucene102HnswBinaryQuantizedVectorsFormat). And here we

Re: [PR] Completion FSTs to be loaded off-heap at all times [lucene]

2025-03-19 Thread via GitHub
jpountz commented on PR #14364: URL: https://github.com/apache/lucene/pull/14364#issuecomment-2736470811 Good question. The class is public and looked like a user-facing API hence my comment, but you can't serialize a NRTSuggester yourself since `NRTSuggesterBuilder` is pkg-private. So it l

Re: [PR] Add leafReaders() Method to IndexReader and Unit Test [lucene]

2025-03-19 Thread via GitHub
jainankitk commented on PR #14370: URL: https://github.com/apache/lucene/pull/14370#issuecomment-273732 > Eh, I am not sold that this change needs to occur if ever. While, "this is how its always been" isn't a good argument for some things, I think expanding the public, and then backwar

[PR] Dummy [lucene]

2025-03-19 Thread via GitHub
ogprakash opened a new pull request, #14376: URL: https://github.com/apache/lucene/pull/14376 ### Description -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Completion FSTs to be loaded off-heap at all times [lucene]

2025-03-19 Thread via GitHub
javanna commented on PR #14364: URL: https://github.com/apache/lucene/pull/14364#issuecomment-2737720139 I merged main in after merging #14372 and added a changelog entry. I believe this is ready to go, and can now be backported to branch_10x as-is. -- This is an automated message from th

Re: [PR] Dummy [lucene]

2025-03-19 Thread via GitHub
ogprakash commented on PR #14376: URL: https://github.com/apache/lucene/pull/14376#issuecomment-2737721997 it was meant for a testing on a fork branch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Dummy [lucene]

2025-03-19 Thread via GitHub
ogprakash closed pull request #14376: Dummy URL: https://github.com/apache/lucene/pull/14376 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsu

Re: [PR] Avoid using time zones that emit warnings (jdk25+) [lucene]

2025-03-19 Thread via GitHub
dweiss commented on PR #14328: URL: https://github.com/apache/lucene/pull/14328#issuecomment-273592 I've backported this to branch_10x. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Add CHANGES entry for CheckIndex HNSW work [lucene]

2025-03-19 Thread via GitHub
javanna commented on PR #14120: URL: https://github.com/apache/lucene/pull/14120#issuecomment-2737700794 Heads up: the original change was backported, but the changelog entry (filed under 10.2) was not, I just backported it to branch_10x. -- This is an automated message from the Apache Gi

Re: [PR] Implement bulk adding methods for dynamic pruning. [lucene]

2025-03-19 Thread via GitHub
jpountz commented on PR #14365: URL: https://github.com/apache/lucene/pull/14365#issuecomment-2736720587 Interesting. I remember playing with calling `BulkAdder#grow` on the estimated number of matching points (to upgrade to a bitset immediately instead of waiting for docs to be collected)

Re: [PR] Add Issue Tracker Link under 'Editing Content on the Lucene™ Sites' [lucene-site]

2025-03-19 Thread via GitHub
DivyanshIITB commented on PR #78: URL: https://github.com/apache/lucene-site/pull/78#issuecomment-2736871034 Just a gentle reminder @sebbASF -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[PR] Optimize ParallelLeafReader to improve term vector fetching efficienc [lucene]

2025-03-19 Thread via GitHub
DivyanshIITB opened a new pull request, #14373: URL: https://github.com/apache/lucene/pull/14373 This PR optimizes ParallelLeafReader to avoid redundant term vector fetching. - Replaces per-field term vector fetching with a single call per reader. - Reduces complexity from O(n^2) to O(n

Re: [PR] Implement bulk adding methods for dynamic pruning. [lucene]

2025-03-19 Thread via GitHub
jpountz commented on code in PR #14365: URL: https://github.com/apache/lucene/pull/14365#discussion_r2004331440 ## lucene/core/src/java/org/apache/lucene/util/DocIdSetBuilder.java: ## @@ -47,6 +47,8 @@ public sealed interface BulkAdder permits FixedBitSetAdder, BufferAdder {

Re: [PR] Completion FSTs to be loaded off-heap at all times [lucene]

2025-03-19 Thread via GitHub
javanna merged PR #14364: URL: https://github.com/apache/lucene/pull/14364 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Fixing quantization interval initialization for optimized sq [lucene]

2025-03-19 Thread via GitHub
benwtrent merged PR #14374: URL: https://github.com/apache/lucene/pull/14374 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] Speedup merging of HNSW graphs [lucene]

2025-03-19 Thread via GitHub
benwtrent commented on PR #14331: URL: https://github.com/apache/lucene/pull/14331#issuecomment-2737726576 > Experiment 3 new QSQ format: ... These improvements make sense to me. The overall bottleneck of vector ops is way lower here, so simply doing fewer ops isn't going to have a

Re: [I] Handling concurrent search in QueryProfiler [lucene]

2025-03-19 Thread via GitHub
jpountz commented on issue #14375: URL: https://github.com/apache/lucene/issues/14375#issuecomment-2738214946 This looks like it could be useful. Maybe it tries to do too much by providing min/avg/max aggregates and it should just provide per-slice breakdowns, leaving whether and how to com

Re: [PR] Completion FSTs to be loaded off-heap at all times [lucene]

2025-03-19 Thread via GitHub
javanna commented on PR #14364: URL: https://github.com/apache/lucene/pull/14364#issuecomment-2738284353 Thanks @jpountz for all the help! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[PR] Speed up advancing within a sparse block in IndexedDISI. [lucene]

2025-03-19 Thread via GitHub
vsop-479 opened a new pull request, #14371: URL: https://github.com/apache/lucene/pull/14371 ### Description Similar to https://github.com/apache/lucene/pull/13692. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] Speedup merging of HNSW graphs [lucene]

2025-03-19 Thread via GitHub
msokolov commented on PR #14331: URL: https://github.com/apache/lucene/pull/14331#issuecomment-2737095119 yes, looks good, I think this is the right tradeoff. We even seem to get improved query performance in some cases. +1 to merge this -- This is an automated message from the Apache Git

Re: [PR] Completion FSTs to be loaded off-heap at all times [lucene]

2025-03-19 Thread via GitHub
javanna commented on PR #14364: URL: https://github.com/apache/lucene/pull/14364#issuecomment-2736648444 I opened #14372 to address the visibility issue of `load`, that should simplify this PR and backporting it once merged. -- This is an automated message from the Apache Git Service. To

Re: [PR] Speedup merging of HNSW graphs [lucene]

2025-03-19 Thread via GitHub
benwtrent commented on code in PR #14331: URL: https://github.com/apache/lucene/pull/14331#discussion_r2003974286 ## lucene/core/src/java/org/apache/lucene/util/hnsw/ConcurrentHnswMerger.java: ## @@ -51,19 +57,85 @@ protected HnswBuilder createBuilder(KnnVectorValues mergedVect

Re: [PR] Implement bulk adding methods for dynamic pruning. [lucene]

2025-03-19 Thread via GitHub
gf2121 commented on PR #14365: URL: https://github.com/apache/lucene/pull/14365#issuecomment-2737667396 > I remember playing with calling BulkAdder#grow on the estimated number of matching points (to upgrade to a bitset immediately instead of waiting for docs to be collected) a while back a

Re: [PR] Adjust visibility of NRTSuggester#load [lucene]

2025-03-19 Thread via GitHub
javanna merged PR #14372: URL: https://github.com/apache/lucene/pull/14372 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Cover all DataType [lucene]

2025-03-19 Thread via GitHub
javanna commented on PR #14091: URL: https://github.com/apache/lucene/pull/14091#issuecomment-2737689624 Heya, the entry in the changelog was filed under 10.2, but the change was never backported. Either we move the changelog entry then, or we backport the change :) -- This is an automat