[PR] Reduce NeighborArray heap memory [lucene]

2025-04-21 Thread via GitHub
weizijun opened a new pull request, #14527: URL: https://github.com/apache/lucene/pull/14527 When bbq is used with lucene, one datanode can contain more data. So when more shards are merged concurrently, there will be a problem of very high heap memory size. I found that the NeighborAr

Re: [PR] Logic for collecting Histogram efficiently using Point Trees [lucene]

2025-04-21 Thread via GitHub
jainankitk commented on PR #14439: URL: https://github.com/apache/lucene/pull/14439#issuecomment-2819914834 @stefanvodita - Thanks for a prompt review. Addressed most of the review comments. Adding JMH benchmark instead of the not so useful performance test added earlier. The benchark resul

Re: [PR] Reduce NeighborArray heap memory [lucene]

2025-04-21 Thread via GitHub
jainankitk commented on code in PR #14527: URL: https://github.com/apache/lucene/pull/14527#discussion_r2053369115 ## lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java: ## @@ -32,13 +33,15 @@ public class NeighborArray { private final boolean scoresDescOrde

Re: [PR] Reduce NeighborArray heap memory [lucene]

2025-04-21 Thread via GitHub
jainankitk commented on PR #14527: URL: https://github.com/apache/lucene/pull/14527#issuecomment-2820142632 > We need to make sure that there are no significant performance or concurrency bugs introduced with this. Could you test with https://github.com/mikemccand/luceneutil to verify recal

Re: [PR] Make task executor non-final [lucene]

2025-04-21 Thread via GitHub
jainankitk commented on PR #14524: URL: https://github.com/apache/lucene/pull/14524#issuecomment-2820171449 While I am not sure about changing to non-final, I am wondering if we should execute task on the current thread? Not sure if we save too much overhead and that makes code less readabl

Re: [PR] Advance doc instead of offset in competitive iterator [lucene]

2025-04-21 Thread via GitHub
gf2121 merged PR #14530: URL: https://github.com/apache/lucene/pull/14530 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] Reduce NeighborArray heap memory [lucene]

2025-04-21 Thread via GitHub
weizijun commented on PR #14527: URL: https://github.com/apache/lucene/pull/14527#issuecomment-2817996406 The TestHnswFloatVectorGraph.testRamUsageEstimate maybe failed, because the OnHeapHnswGraph.ramBytesUsed use the fixed array size to calculate the ram value. -- This is an automated

[PR] speed up numDeletesToMerge of SoftDeletesRetentionMergePolicy [lucene]

2025-04-21 Thread via GitHub
gf2121 opened a new pull request, #14531: URL: https://github.com/apache/lucene/pull/14531 This change helps `SoftDeletesRetentionMergePolicy` to get a chance to take advantage of `DenseConjunctionBulkScorer` to speed up count `numDeletesToMerge`. relates: https://github.com/apache/lu

[PR] Impl intoBitset for IndexedDISI and Docvalues [lucene]

2025-04-21 Thread via GitHub
gf2121 opened a new pull request, #14529: URL: https://github.com/apache/lucene/pull/14529 Implement `intoBitset` for `IndexedDISI` and Docvalues. `intoBitset` of Docvalues has already been called in competitive iterators, and can also be used to speed up soft delete operations.

Re: [PR] Compute the doc range more efficiently when flushing doc block [lucene]

2025-04-21 Thread via GitHub
jainankitk commented on PR #14447: URL: https://github.com/apache/lucene/pull/14447#issuecomment-2819114067 Thanks for updating the PR to include `Lucene103PostingsWriter`. LGTM! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] Add a timeout for forceMergeDeletes in IndexWriter [lucene]

2025-04-21 Thread via GitHub
jpountz commented on issue #14431: URL: https://github.com/apache/lucene/issues/14431#issuecomment-2819513522 > I don't know if we are already doing this -- is this TieredMergePolicy's default behavior (1 -> 1) for forceMergeDeletes? I don't think so? It's not the default indeed. Tier

[PR] Fix broken intellij 2025.1 gradle import. [lucene]

2025-04-21 Thread via GitHub
dweiss opened a new pull request, #14528: URL: https://github.com/apache/lucene/pull/14528 Intellij 2025.1 was failing to import Lucene after an upgrade: ``` * What went wrong: java.io.NotSerializableException: org.gradle.api.internal.file.DefaultFilePropertyFactory$FixedFile org

Re: [PR] Reduce NeighborArray heap memory [lucene]

2025-04-21 Thread via GitHub
benwtrent commented on PR #14527: URL: https://github.com/apache/lucene/pull/14527#issuecomment-2818418948 We need to make sure that there are no significant performance or concurrency bugs introduced with this. Could you test with https://github.com/mikemccand/luceneutil to verify recall,

Re: [PR] Advance doc instead of offset in competitive iterator [lucene]

2025-04-21 Thread via GitHub
benwtrent commented on code in PR #14530: URL: https://github.com/apache/lucene/pull/14530#discussion_r2052479353 ## lucene/core/src/java/org/apache/lucene/search/comparators/TermOrdValComparator.java: ## @@ -533,8 +533,8 @@ public void intoBitSet(int upTo, FixedBitSet bitSet, i

Re: [PR] Ensuring skip list is read for fields indexed with only DOCS [lucene]

2025-04-21 Thread via GitHub
expani commented on code in PR #14511: URL: https://github.com/apache/lucene/pull/14511#discussion_r2052812090 ## lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java: ## @@ -1310,7 +1317,7 @@ public List getImpacts(int level) { r

Re: [PR] Fix broken intellij 2025.1 gradle import. [lucene]

2025-04-21 Thread via GitHub
dweiss merged PR #14528: URL: https://github.com/apache/lucene/pull/14528 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [I] TestForTooMuchCloning.test fails [lucene]

2025-04-21 Thread via GitHub
dweiss commented on issue #14220: URL: https://github.com/apache/lucene/issues/14220#issuecomment-2819309092 This resurfaced recently - I've just hit this on github, it does happen on jenkins too. :( -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [I] Add a timeout for forceMergeDeletes in IndexWriter [lucene]

2025-04-21 Thread via GitHub
mikemccand commented on issue #14431: URL: https://github.com/apache/lucene/issues/14431#issuecomment-2819320548 If we do add this timeout, I don't think the still-running merges kicked off during `forceMergeDeletes` should abort -- they should ideally run to completion, just in the backgro

[PR] Make competitive iterators more robust. [lucene]

2025-04-21 Thread via GitHub
jpountz opened a new pull request, #14532: URL: https://github.com/apache/lucene/pull/14532 As per a recent bug (#14517), competitive iterators are hard to get right given how their state gets updated in place. This commit tries to make them more robust by extracting the logic of updating t