Re: [PR] cache preset dict for LZ4WithPresetDictDecompressor [lucene]

2025-03-28 Thread via GitHub
jainankitk commented on code in PR #14397: URL: https://github.com/apache/lucene/pull/14397#discussion_r2019666086 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/compressing/Lucene90CompressingStoredFieldsReader.java: ## @@ -512,6 +512,7 @@ private void doReset(int do

Re: [PR] Speed up advancing within a sparse block in IndexedDISI. [lucene]

2025-03-28 Thread via GitHub
vsop-479 commented on PR #14371: URL: https://github.com/apache/lucene/pull/14371#issuecomment-2763043900 @gf2121 I also implemented `advanceExact` with vector, there is still a slowdown. I will try to measure it on other laptop (with more vector lanes). Benchmark

Re: [PR] Fix HistogramCollector to not create zero-count buckets. [lucene]

2025-03-28 Thread via GitHub
jainankitk commented on code in PR #14421: URL: https://github.com/apache/lucene/pull/14421#discussion_r2019647007 ## lucene/sandbox/src/test/org/apache/lucene/sandbox/facet/plain/histograms/TestHistogramCollectorManager.java: ## @@ -137,6 +140,23 @@ private void doTestSkipIndex

Re: [PR] cache preset dict for LZ4WithPresetDictDecompressor [lucene]

2025-03-28 Thread via GitHub
kkewwei commented on code in PR #14397: URL: https://github.com/apache/lucene/pull/14397#discussion_r2019696179 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/LZ4WithPresetDictCompressionMode.java: ## @@ -144,10 +148,85 @@ public void decompress(DataInput in, int orig

Re: [PR] For hnsw merger, do not pop from empty heap [lucene]

2025-03-28 Thread via GitHub
benwtrent merged PR #14420: URL: https://github.com/apache/lucene/pull/14420 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] Allow skip cache factor to be updated dynamically [lucene]

2025-03-28 Thread via GitHub
sgup432 commented on code in PR #14412: URL: https://github.com/apache/lucene/pull/14412#discussion_r2019729847 ## lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java: ## @@ -142,6 +144,14 @@ public LRUQueryCache( missCount = new LongAdder(); } + public f

Re: [I] Examine the affects of MADV_RANDOM when MGLRU is enabled in Linux kernel [lucene]

2025-03-28 Thread via GitHub
rmuir commented on issue #14408: URL: https://github.com/apache/lucene/issues/14408#issuecomment-2755385165 A better default would also be possible, I think, if individual codecs such as vectors, did not set this directly, but instead allowed the Directory class to be in control. Use

Re: [PR] bump antlr 4.11.1 -> 4.13.2 [lucene]

2025-03-28 Thread via GitHub
rmuir commented on code in PR #14388: URL: https://github.com/apache/lucene/pull/14388#discussion_r2008149095 ## lucene/expressions/src/generated/checksums/generateAntlr.json: ## @@ -1,7 +1,8 @@ { "lucene/expressions/src/java/org/apache/lucene/expressions/js/Javascript.g4

Re: [PR] For hnsw merger, do not pop from empty heap [lucene]

2025-03-28 Thread via GitHub
tveasey commented on PR #14420: URL: https://github.com/apache/lucene/pull/14420#issuecomment-2761706401 I discussed this a bit with Mayya. So the underlying issue is how we account for the gain for degree 1 vertices. We say that the gain contribution from any vertex is at least 2, i.

Re: [I] New merging hnsw failures with BP policy [lucene]

2025-03-28 Thread via GitHub
benwtrent closed issue #14407: New merging hnsw failures with BP policy URL: https://github.com/apache/lucene/issues/14407 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] For hnsw merger, do not pop from empty heap [lucene]

2025-03-28 Thread via GitHub
mayya-sharipova commented on PR #14420: URL: https://github.com/apache/lucene/pull/14420#issuecomment-2761615009 @benwtrent Please go ahead with a fix. Confirmed with @tveasey, algorithm's author. But we also look further into it. In this particular test, graphs are unusual (small gr

Re: [PR] For hnsw merger, do not pop from empty heap [lucene]

2025-03-28 Thread via GitHub
benwtrent commented on PR #14420: URL: https://github.com/apache/lucene/pull/14420#issuecomment-2761394895 > Does this case actually only happen when docs are reordered, or is it a general edge case that we happened to find on a test run when docs happened to be reordered? I think it

Re: [PR] Handle NaN results in TestVectorUtilSupport.testBinaryVectors [lucene]

2025-03-28 Thread via GitHub
thecoop commented on code in PR #14419: URL: https://github.com/apache/lucene/pull/14419#discussion_r2018614545 ## lucene/core/src/test/org/apache/lucene/internal/vectorization/TestVectorUtilSupport.java: ## @@ -210,9 +210,13 @@ public void testMinMaxScalarQuantize() { }

Re: [PR] Make DenseConjunctionBulkScorer align scoring windows with #docIDRunEnd(). [lucene]

2025-03-28 Thread via GitHub
jpountz commented on code in PR #14400: URL: https://github.com/apache/lucene/pull/14400#discussion_r2018633191 ## lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java: ## @@ -171,37 +171,36 @@ private int scoreWindow( } } -if (acceptDo

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-03-28 Thread via GitHub
gsmiller commented on code in PR #14273: URL: https://github.com/apache/lucene/pull/14273#discussion_r2018769975 ## lucene/core/src/java/org/apache/lucene/util/FixedBitSet.java: ## @@ -204,6 +205,40 @@ public int cardinality() { return Math.toIntExact(tot); } + /** +

[PR] Fix HistogramCollector to not create zero-count buckets. [lucene]

2025-03-28 Thread via GitHub
jpountz opened a new pull request, #14421: URL: https://github.com/apache/lucene/pull/14421 If a bucket in the middle of the range doesn't match docs, it would be returned with a count of zero. Better not return it at all. -- This is an automated message from the Apache Git Service. To re

Re: [PR] PointInSetQuery use reverse collection to improve performance [lucene]

2025-03-28 Thread via GitHub
github-actions[bot] commented on PR #14352: URL: https://github.com/apache/lucene/pull/14352#issuecomment-2762926172 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [I] Leverage sparse doc value indexes for range and value facet collection [lucene]

2025-03-28 Thread via GitHub
epotyom commented on issue #14406: URL: https://github.com/apache/lucene/issues/14406#issuecomment-2761891472 We might still be able to benefit from Skipper in sandbox facets module. In FacetCutter for long ranges https://github.com/apache/lucene/blob/b5d13e29f1431ba30ae7df43c89def87e1677db

[PR] Handle NaN results in TestVectorUtilSupport.testBinaryVectors [lucene]

2025-03-28 Thread via GitHub
iverase opened a new pull request, #14419: URL: https://github.com/apache/lucene/pull/14419 I notice the following error in CI: ``` ./gradlew test --tests TestVectorUtilSupport.testBinaryVectors -Dtests.seed=B12D50704230E803 -Dtests.locale=ce -Dtests.timezone=SystemV/AST4 -Dtests.

Re: [PR] removing constructor with deprecated attribute 'onlyLongestMatch [lucene]

2025-03-28 Thread via GitHub
renatoh commented on PR #14356: URL: https://github.com/apache/lucene/pull/14356#issuecomment-2760436593 @rmuir Please have a look at it when you have the chance. thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Speed up advancing within a sparse block in IndexedDISI. [lucene]

2025-03-28 Thread via GitHub
vsop-479 commented on PR #14371: URL: https://github.com/apache/lucene/pull/14371#issuecomment-2760720322 @gf2121 I implemented `VectorMask` approach. There is still a slowdown. I think the reason is my laptop (Mac M2). Benchmark Mode Cnt

Re: [PR] Handle NaN results in TestVectorUtilSupport.testBinaryVectors [lucene]

2025-03-28 Thread via GitHub
iverase commented on code in PR #14419: URL: https://github.com/apache/lucene/pull/14419#discussion_r2018525082 ## lucene/core/src/test/org/apache/lucene/internal/vectorization/TestVectorUtilSupport.java: ## @@ -210,9 +210,13 @@ public void testMinMaxScalarQuantize() { }

Re: [PR] Allow skip cache factor to be updated dynamically [lucene]

2025-03-28 Thread via GitHub
sgup432 commented on code in PR #14412: URL: https://github.com/apache/lucene/pull/14412#discussion_r2019087636 ## lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java: ## @@ -99,7 +100,7 @@ public class LRUQueryCache implements QueryCache, Accountable { private

Re: [PR] Enable collectors to take advantage of pre-aggregated data. [lucene]

2025-03-28 Thread via GitHub
gf2121 commented on code in PR #14401: URL: https://github.com/apache/lucene/pull/14401#discussion_r2018024992 ## lucene/core/src/java/org/apache/lucene/search/LeafCollector.java: ## @@ -83,6 +84,21 @@ public interface LeafCollector { */ void collect(int doc) throws IOExc

Re: [PR] Preparing existing profiler for adding concurrent profiling [lucene]

2025-03-28 Thread via GitHub
jainankitk commented on PR #14413: URL: https://github.com/apache/lucene/pull/14413#issuecomment-2760401254 One of the failing check is: ``` -- 1. ERROR in /home/runner/work/lucene/lucene/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/QueryProfilerBreakdown.jav

[PR] For hnsw merger, do not pop from empty heap [lucene]

2025-03-28 Thread via GitHub
benwtrent opened a new pull request, #14420: URL: https://github.com/apache/lucene/pull/14420 It seems there are edge cases when the heap is empty. This prevents us from attempting to pop from an empty heap. This bug fix should go into 10.2 to make the 10.2.0 release. closes: h

Re: [PR] For hnsw merger, do not pop from empty heap [lucene]

2025-03-28 Thread via GitHub
benwtrent commented on PR #14420: URL: https://github.com/apache/lucene/pull/14420#issuecomment-2761161246 //cc @iverase as you are handling the 10.2 release. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] Allow skip cache factor to be updated dynamically [lucene]

2025-03-28 Thread via GitHub
jpountz commented on code in PR #14412: URL: https://github.com/apache/lucene/pull/14412#discussion_r2019386671 ## lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java: ## @@ -125,7 +125,9 @@ public LRUQueryCache( this.maxSize = maxSize; this.maxRamBytesUse

Re: [PR] Make DenseConjunctionBulkScorer align scoring windows with #docIDRunEnd(). [lucene]

2025-03-28 Thread via GitHub
jpountz merged PR #14400: URL: https://github.com/apache/lucene/pull/14400 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Enable collectors to take advantage of pre-aggregated data. [lucene]

2025-03-28 Thread via GitHub
gsmiller commented on PR #14401: URL: https://github.com/apache/lucene/pull/14401#issuecomment-2761478694 > I don't think so, or rather taking advantage of range collection shouldn't help more than what https://github.com/apache/lucene/pull/14273 does with RangeDocIdStream? My thinki

Re: [PR] Enable collectors to take advantage of pre-aggregated data. [lucene]

2025-03-28 Thread via GitHub
jpountz commented on PR #14401: URL: https://github.com/apache/lucene/pull/14401#issuecomment-2761651163 Ah, that's right. We have a good number of queries that are already covered, in my opinion the next natural step is to look into making ranges collect ranges when any clause would collec

Re: [PR] Allow skip cache factor to be updated dynamically [lucene]

2025-03-28 Thread via GitHub
jpountz commented on code in PR #14412: URL: https://github.com/apache/lucene/pull/14412#discussion_r2018865956 ## lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java: ## @@ -122,12 +123,30 @@ public LRUQueryCache( long maxRamBytesUsed, Predicate leave

Re: [PR] PointInSetQuery clips segments by lower and upper [lucene]

2025-03-28 Thread via GitHub
gsmiller commented on code in PR #14268: URL: https://github.com/apache/lucene/pull/14268#discussion_r2018814824 ## lucene/core/src/java/org/apache/lucene/search/PointInSetQuery.java: ## @@ -108,6 +110,8 @@ protected PointInSetQuery(String field, int numDims, int bytesPerDim, S

Re: [PR] Enable collectors to take advantage of pre-aggregated data. [lucene]

2025-03-28 Thread via GitHub
jpountz commented on PR #14401: URL: https://github.com/apache/lucene/pull/14401#issuecomment-2761666414 Any opinion on `collect(int min, int max)` vs. `collectRange(int min, int max)`? I leaned towards `collectRange` since we already have `collect(int doc)` and it wouldn't be obvious from

Re: [PR] Handle NaN results in TestVectorUtilSupport.testBinaryVectors [lucene]

2025-03-28 Thread via GitHub
thecoop commented on code in PR #14419: URL: https://github.com/apache/lucene/pull/14419#discussion_r2018614545 ## lucene/core/src/test/org/apache/lucene/internal/vectorization/TestVectorUtilSupport.java: ## @@ -210,9 +210,13 @@ public void testMinMaxScalarQuantize() { }

Re: [PR] Enable collectors to take advantage of pre-aggregated data. [lucene]

2025-03-28 Thread via GitHub
jpountz commented on code in PR #14401: URL: https://github.com/apache/lucene/pull/14401#discussion_r2018638902 ## lucene/core/src/java/org/apache/lucene/search/RangeDocIdStream.java: ## @@ -0,0 +1,44 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] Enable collectors to take advantage of pre-aggregated data. [lucene]

2025-03-28 Thread via GitHub
gsmiller commented on PR #14401: URL: https://github.com/apache/lucene/pull/14401#issuecomment-2761941694 I prefer `collectRange` as well to make usage a little less error-prone. I don't have a strong opinion though. -- This is an automated message from the Apache Git Service. To respond

Re: [PR] Allow skip cache factor to be updated dynamically [lucene]

2025-03-28 Thread via GitHub
sgup432 commented on code in PR #14412: URL: https://github.com/apache/lucene/pull/14412#discussion_r2019040588 ## lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java: ## @@ -122,12 +123,30 @@ public LRUQueryCache( long maxRamBytesUsed, Predicate leave

Re: [PR] Handle NaN results in TestVectorUtilSupport.testBinaryVectors [lucene]

2025-03-28 Thread via GitHub
iverase merged PR #14419: URL: https://github.com/apache/lucene/pull/14419 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Preparing existing profiler for adding concurrent profiling [lucene]

2025-03-28 Thread via GitHub
jpountz commented on PR #14413: URL: https://github.com/apache/lucene/pull/14413#issuecomment-2761334037 You just need to replace `ctx` with `_`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-03-28 Thread via GitHub
jpountz commented on code in PR #14273: URL: https://github.com/apache/lucene/pull/14273#discussion_r2018661757 ## lucene/core/src/java/org/apache/lucene/util/FixedBitSet.java: ## @@ -204,6 +205,40 @@ public int cardinality() { return Math.toIntExact(tot); } + /** +

Re: [PR] For hnsw merger, do not pop from empty heap [lucene]

2025-03-28 Thread via GitHub
jpountz commented on PR #14420: URL: https://github.com/apache/lucene/pull/14420#issuecomment-2761367741 Does this case actually only happen when docs are reordered, or is it a general edge case that we happened to find on a test run when docs happened to be reordered? -- This is an auto

Re: [PR] Allow skip cache factor to be updated dynamically [lucene]

2025-03-28 Thread via GitHub
sgup432 commented on code in PR #14412: URL: https://github.com/apache/lucene/pull/14412#discussion_r2019109207 ## lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java: ## @@ -142,6 +161,10 @@ public LRUQueryCache( missCount = new LongAdder(); } + AtomicRe

Re: [PR] Allow skip cache factor to be updated dynamically [lucene]

2025-03-28 Thread via GitHub
sgup432 commented on code in PR #14412: URL: https://github.com/apache/lucene/pull/14412#discussion_r2019115721 ## lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java: ## @@ -269,8 +292,8 @@ boolean requiresEviction() { } CacheAndCount get(Query key, IndexRe

Re: [PR] Handle NaN results in TestVectorUtilSupport.testBinaryVectors [lucene]

2025-03-28 Thread via GitHub
gf2121 commented on code in PR #14419: URL: https://github.com/apache/lucene/pull/14419#discussion_r2018496277 ## lucene/core/src/test/org/apache/lucene/internal/vectorization/TestVectorUtilSupport.java: ## @@ -210,9 +210,13 @@ public void testMinMaxScalarQuantize() { }