Re: [PR] Shore up some intoBitSet impls and add paranoid protections [lucene]

2025-04-18 Thread via GitHub
gf2121 commented on code in PR #14523: URL: https://github.com/apache/lucene/pull/14523#discussion_r2051413753 ## lucene/core/src/java/org/apache/lucene/search/comparators/TermOrdValComparator.java: ## @@ -524,17 +524,21 @@ public int advance(int target) throws IOException {

Re: [PR] Use Arrays#mismatch in FSTEnum#rewindPrefix. [lucene]

2025-04-18 Thread via GitHub
vsop-479 commented on PR #14477: URL: https://github.com/apache/lucene/pull/14477#issuecomment-2816566877 @mikemccand Please take a look when you get a chance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] Adding TestSpanWithinQuery with basic test cases for SpanWithinQuery [lucene]

2025-04-18 Thread via GitHub
vigyasharma merged PR #14405: URL: https://github.com/apache/lucene/pull/14405 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene

Re: [PR] Remove sloppySin calculations [lucene]

2025-04-18 Thread via GitHub
rmuir commented on PR #14516: URL: https://github.com/apache/lucene/pull/14516#issuecomment-2816395683 See guide here: https://github.com/apache/lucene/blob/main/help/jmh.txt -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Remove sloppySin calculations [lucene]

2025-04-18 Thread via GitHub
rmuir commented on PR #14516: URL: https://github.com/apache/lucene/pull/14516#issuecomment-2816394874 But the benchmark is not useful, as it was written as a test. Tests don't run normally, in particular without java C2 compiler, so trying to make benchmark via test is not useful and biase

Re: [PR] Remove sloppySin calculations [lucene]

2025-04-18 Thread via GitHub
jainankitk commented on PR #14516: URL: https://github.com/apache/lucene/pull/14516#issuecomment-2816388606 Performance comparison on linux machine, for 1000 iterations (significant change is more than 25%): ``` % python3 ~/benchmark_comp.py

Re: [PR] Remove sloppySin calculations [lucene]

2025-04-18 Thread via GitHub
jainankitk commented on PR #14516: URL: https://github.com/apache/lucene/pull/14516#issuecomment-2816361569 Performance comparison on laptop, for 1000 iterations (significant change is more than 5%): ``` % python3 ~/benchmark_comp.py Found 128 test cases in baseline file Foun

Re: [I] Strange stack traces for new bitset focused doc iterators [lucene]

2025-04-18 Thread via GitHub
jpountz commented on issue #14517: URL: https://github.com/apache/lucene/issues/14517#issuecomment-2816267241 You are correct. Here is the simplest recreation of the bug I could come up with: ```java Directory dir = new ByteBuffersDirectory(); IndexWriter w = new IndexWriter(dir

Re: [I] Strange stack traces for new bitset focused doc iterators [lucene]

2025-04-18 Thread via GitHub
benwtrent commented on issue #14517: URL: https://github.com/apache/lucene/issues/14517#issuecomment-2816145174 I think maybe this is possible with this competitiveIterator if we move from dense to sparse, but in such a way that its min document is past the NEW min document in the new windo

Re: [PR] Shore up some intoBitSet impls and add paranoid protections [lucene]

2025-04-18 Thread via GitHub
benwtrent commented on code in PR #14523: URL: https://github.com/apache/lucene/pull/14523#discussion_r2051060772 ## lucene/core/src/java/org/apache/lucene/search/DisjunctionDISIApproximation.java: ## @@ -146,6 +146,9 @@ public int advance(int target) throws IOException { @Ov

Re: [PR] Shore up some intoBitSet impls and add paranoid protections [lucene]

2025-04-18 Thread via GitHub
benwtrent commented on code in PR #14523: URL: https://github.com/apache/lucene/pull/14523#discussion_r2051060255 ## lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java: ## @@ -236,10 +236,13 @@ private void scoreWindowUsingBitSet( windowMatches.

Re: [I] Strange stack traces for new bitset focused doc iterators [lucene]

2025-04-18 Thread via GitHub
benwtrent commented on issue #14517: URL: https://github.com/apache/lucene/issues/14517#issuecomment-2816102821 Another possibility, the `competitiveIterator.docID()` positioning and the `docsWithField.docID()` got out of sync somehow. Reading the code, I don't immediately see how, but this

Re: [I] Strange stack traces for new bitset focused doc iterators [lucene]

2025-04-18 Thread via GitHub
benwtrent commented on issue #14517: URL: https://github.com/apache/lucene/issues/14517#issuecomment-2816071263 I am still trying to replicate and have been unable to do so. However, one strange thing is that it seems that `competitiveIterator` can be lazily updated and `docValuesTe

[PR] Add AnytimeRankingSearcher for SLA-Aware Early Termination with Bin-Based Score Boosting [lucene]

2025-04-18 Thread via GitHub
atris opened a new pull request, #14525: URL: https://github.com/apache/lucene/pull/14525 Add AnytimeRankingSearcher for SLA-aware early termination with bin-based score boosting This patch adds AnytimeRankingSearcher, a new low-latency search implementation that supports early termi

[PR] Make task executor non-final [lucene]

2025-04-18 Thread via GitHub
Shibi-bala opened a new pull request, #14524: URL: https://github.com/apache/lucene/pull/14524 ### Description The new task executor implementation from https://github.com/apache/lucene/pull/13861 means users can't control the parallelism of the tasks being run. I think it is fair to

Re: [PR] Shore up some intoBitSet impls and add paranoid protections [lucene]

2025-04-18 Thread via GitHub
benwtrent commented on code in PR #14523: URL: https://github.com/apache/lucene/pull/14523#discussion_r2050933006 ## lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java: ## @@ -434,6 +434,9 @@ public void intoBitSet(int upTo, FixedBitSet bitSet, int

Re: [PR] Shore up some intoBitSet impls and add paranoid protections [lucene]

2025-04-18 Thread via GitHub
gf2121 commented on code in PR #14523: URL: https://github.com/apache/lucene/pull/14523#discussion_r2050930797 ## lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java: ## @@ -236,10 +236,13 @@ private void scoreWindowUsingBitSet( windowMatches.set

Re: [PR] Shore up some intoBitSet impls and add paranoid protections [lucene]

2025-04-18 Thread via GitHub
gf2121 commented on code in PR #14523: URL: https://github.com/apache/lucene/pull/14523#discussion_r2050930797 ## lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java: ## @@ -236,10 +236,13 @@ private void scoreWindowUsingBitSet( windowMatches.set

Re: [PR] Shore up some intoBitSet impls and add paranoid protections [lucene]

2025-04-18 Thread via GitHub
benwtrent commented on code in PR #14523: URL: https://github.com/apache/lucene/pull/14523#discussion_r2050920191 ## lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java: ## @@ -236,10 +236,13 @@ private void scoreWindowUsingBitSet( windowMatches.

Re: [PR] Shore up some intoBitSet impls and add paranoid protections [lucene]

2025-04-18 Thread via GitHub
gf2121 commented on code in PR #14523: URL: https://github.com/apache/lucene/pull/14523#discussion_r2050908893 ## lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java: ## @@ -236,10 +236,13 @@ private void scoreWindowUsingBitSet( windowMatches.set

Re: [PR] Shore up some intoBitSet impls and add paranoid protections [lucene]

2025-04-18 Thread via GitHub
benwtrent commented on code in PR #14523: URL: https://github.com/apache/lucene/pull/14523#discussion_r2050914945 ## lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java: ## @@ -236,10 +236,13 @@ private void scoreWindowUsingBitSet( windowMatches.

Re: [PR] Shore up some intoBitSet impls and add paranoid protections [lucene]

2025-04-18 Thread via GitHub
gf2121 commented on code in PR #14523: URL: https://github.com/apache/lucene/pull/14523#discussion_r2050908893 ## lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java: ## @@ -236,10 +236,13 @@ private void scoreWindowUsingBitSet( windowMatches.set

Re: [PR] Shore up some intoBitSet impls and add paranoid protections [lucene]

2025-04-18 Thread via GitHub
benwtrent commented on code in PR #14523: URL: https://github.com/apache/lucene/pull/14523#discussion_r2050897977 ## lucene/core/src/java/org/apache/lucene/search/DocIdSetIterator.java: ## @@ -203,7 +203,8 @@ protected final int slowAdvance(int target) throws IOException {

Re: [PR] Shore up some intoBitSet impls and add paranoid protections [lucene]

2025-04-18 Thread via GitHub
gf2121 commented on code in PR #14523: URL: https://github.com/apache/lucene/pull/14523#discussion_r2050897763 ## lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java: ## @@ -236,10 +236,13 @@ private void scoreWindowUsingBitSet( windowMatches.set

Re: [PR] Shore up some intoBitSet impls and add paranoid protections [lucene]

2025-04-18 Thread via GitHub
gf2121 commented on code in PR #14523: URL: https://github.com/apache/lucene/pull/14523#discussion_r2050897763 ## lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java: ## @@ -236,10 +236,13 @@ private void scoreWindowUsingBitSet( windowMatches.set

Re: [PR] Shore up some intoBitSet impls and add paranoid protections [lucene]

2025-04-18 Thread via GitHub
benwtrent commented on code in PR #14523: URL: https://github.com/apache/lucene/pull/14523#discussion_r2050896318 ## lucene/core/src/java/org/apache/lucene/search/DisjunctionDISIApproximation.java: ## @@ -146,7 +146,13 @@ public int advance(int target) throws IOException { @O

Re: [PR] Shore up some intoBitSet impls and add paranoid protections [lucene]

2025-04-18 Thread via GitHub
benwtrent commented on code in PR #14523: URL: https://github.com/apache/lucene/pull/14523#discussion_r2050892141 ## lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java: ## @@ -236,10 +236,13 @@ private void scoreWindowUsingBitSet( windowMatches.

Re: [PR] Shore up some intoBitSet impls and add paranoid protections [lucene]

2025-04-18 Thread via GitHub
gf2121 commented on code in PR #14523: URL: https://github.com/apache/lucene/pull/14523#discussion_r2050892058 ## lucene/core/src/java/org/apache/lucene/search/DocIdSetIterator.java: ## @@ -203,7 +203,8 @@ protected final int slowAdvance(int target) throws IOException { * @

Re: [PR] Shore up some intoBitSet impls and add paranoid protections [lucene]

2025-04-18 Thread via GitHub
gf2121 commented on code in PR #14523: URL: https://github.com/apache/lucene/pull/14523#discussion_r2050888478 ## lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java: ## @@ -236,10 +236,13 @@ private void scoreWindowUsingBitSet( windowMatches.set

Re: [PR] Shore up some intoBitSet impls and add paranoid protections [lucene]

2025-04-18 Thread via GitHub
gf2121 commented on code in PR #14523: URL: https://github.com/apache/lucene/pull/14523#discussion_r2050888478 ## lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java: ## @@ -236,10 +236,13 @@ private void scoreWindowUsingBitSet( windowMatches.set

Re: [PR] Shore up some intoBitSet impls and add paranoid protections [lucene]

2025-04-18 Thread via GitHub
benwtrent commented on code in PR #14523: URL: https://github.com/apache/lucene/pull/14523#discussion_r2050722768 ## lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java: ## @@ -434,6 +434,9 @@ public void intoBitSet(int upTo, FixedBitSet bitSet, int

[I] Speed up soft delete [lucene]

2025-04-18 Thread via GitHub
gf2121 opened a new issue, #14521: URL: https://github.com/apache/lucene/issues/14521 ### Description Soft deletes consume a lot of CPU when flushing docvalue updates or calculating the `numsToDelete` in `SoftDeleteRetentionMergePolicy`. I was looking for some way to speed up these o

Re: [PR] cache preset dict for LZ4WithPresetDictDecompressor [lucene]

2025-04-18 Thread via GitHub
kkewwei commented on code in PR #14397: URL: https://github.com/apache/lucene/pull/14397#discussion_r2048995276 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/compressing/Lucene90CompressingStoredFieldsReader.java: ## @@ -512,6 +512,7 @@ private void doReset(int docID

[I] Strange stack traces for new bitset focused doc iterators [lucene]

2025-04-18 Thread via GitHub
benwtrent opened a new issue, #14517: URL: https://github.com/apache/lucene/issues/14517 ### Description With Lucene 10.2, we have seen some exceptions that are rather troubling. It appears that the into bit set code is buggy when utilizing multiple layers of iterators. It do

Re: [PR] Shore up some intoBitSet impls and add paranoid protections [lucene]

2025-04-18 Thread via GitHub
benwtrent commented on code in PR #14523: URL: https://github.com/apache/lucene/pull/14523#discussion_r2050875107 ## lucene/core/src/java/org/apache/lucene/search/DocIdSetIterator.java: ## @@ -203,7 +203,8 @@ protected final int slowAdvance(int target) throws IOException {

Re: [I] Strange stack traces for new bitset focused doc iterators [lucene]

2025-04-18 Thread via GitHub
benwtrent commented on issue #14517: URL: https://github.com/apache/lucene/issues/14517#issuecomment-2815855129 @jpountz we have been trying unsuccessfully to get the failing query. I will response as soon as I can with it. -- This is an automated message from the Apache Git Service. To r

Re: [PR] Shore up some intoBitSet impls and add paranoid protections [lucene]

2025-04-18 Thread via GitHub
benwtrent commented on code in PR #14523: URL: https://github.com/apache/lucene/pull/14523#discussion_r2050874075 ## lucene/core/src/java/org/apache/lucene/search/DisjunctionDISIApproximation.java: ## @@ -146,7 +146,13 @@ public int advance(int target) throws IOException { @O

[I] [Bug] Postings force merge regression between Lucene 9.12 and Lucene 10.0 [lucene]

2025-04-18 Thread via GitHub
bharath-techie opened a new issue, #14514: URL: https://github.com/apache/lucene/issues/14514 ### Description This is fork of the issue https://github.com/apache/lucene/issues/14463 specific to postings format regression. As part of 10.0 Lucene, default readAdvice changed to r

Re: [PR] Upgrade to gradle 8.14-rc-2 [lucene]

2025-04-18 Thread via GitHub
harshavamsi commented on PR #14519: URL: https://github.com/apache/lucene/pull/14519#issuecomment-2815780418 > Don't get me wrong - please feel free to provide a pull request that passes with the rc release, this will make things easier! I just don't think we should merge it until the final

Re: [PR] Shore up some intoBitSet impls and add paranoid protections [lucene]

2025-04-18 Thread via GitHub
gf2121 commented on code in PR #14523: URL: https://github.com/apache/lucene/pull/14523#discussion_r2050789304 ## lucene/core/src/java/org/apache/lucene/search/DisjunctionDISIApproximation.java: ## @@ -146,7 +146,13 @@ public int advance(int target) throws IOException { @Over

Re: [I] Strange stack traces for new bitset focused doc iterators [lucene]

2025-04-18 Thread via GitHub
jpountz commented on issue #14517: URL: https://github.com/apache/lucene/issues/14517#issuecomment-2815624469 Do you know if you would be able to find the query that caused this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Shore up some intoBitSet impls and add paranoid protections [lucene]

2025-04-18 Thread via GitHub
ChrisHegarty commented on code in PR #14523: URL: https://github.com/apache/lucene/pull/14523#discussion_r2050728572 ## lucene/core/src/java/org/apache/lucene/search/comparators/TermOrdValComparator.java: ## @@ -524,6 +524,7 @@ public int advance(int target) throws IOException {

[PR] intobitset term ord comp [lucene]

2025-04-18 Thread via GitHub
benwtrent opened a new pull request, #14523: URL: https://github.com/apache/lucene/pull/14523 Similar to @ChrisHegarty 's change for the count fix, this will move up the `max` check to before the into bit set. It seems like we could be calling intobitset erroneously on some edge cases.

Re: [PR] Fix DISIDocIdStream::count so that it does not try to count beyond max [lucene]

2025-04-18 Thread via GitHub
ChrisHegarty merged PR #14522: URL: https://github.com/apache/lucene/pull/14522 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucen

Re: [PR] Use a non-deprecated assertThat, and change several test assertions to use assertThat [lucene]

2025-04-18 Thread via GitHub
rmuir commented on code in PR #14518: URL: https://github.com/apache/lucene/pull/14518#discussion_r2050669053 ## lucene/core/src/test/org/apache/lucene/document/TestDocument.java: ## @@ -233,7 +245,7 @@ public void testPositionIncrementMultiFields() throws Exception { Phra

Re: [I] Strange stack traces for new bitset focused doc iterators [lucene]

2025-04-18 Thread via GitHub
jpountz commented on issue #14517: URL: https://github.com/apache/lucene/issues/14517#issuecomment-2815471980 > This tells me that the iterator actually wasn't progressed past offset which is weird...time to look up the stack again. I agree that this is puzzling. This suggests that th

Re: [PR] Fix DISIDocIdStream::count so that it does not try to count beyond max [lucene]

2025-04-18 Thread via GitHub
ChrisHegarty commented on PR #14522: URL: https://github.com/apache/lucene/pull/14522#issuecomment-2815387228 > Thank you! Did it fail with an existing query/collector, or did you find it while trying to take advantage of `DocIdStream` for a new use-case? We see it in stack traces of

Re: [PR] Use a non-deprecated assertThat, and change several test assertions to use assertThat [lucene]

2025-04-18 Thread via GitHub
dweiss commented on code in PR #14518: URL: https://github.com/apache/lucene/pull/14518#discussion_r2050545494 ## lucene/core/src/test/org/apache/lucene/document/TestDocument.java: ## @@ -233,7 +245,7 @@ public void testPositionIncrementMultiFields() throws Exception { Phr

[PR] Fix DISIDocIdStream::count so that it does not try to count beyond max [lucene]

2025-04-18 Thread via GitHub
ChrisHegarty opened a new pull request, #14522: URL: https://github.com/apache/lucene/pull/14522 Fix `DISIDocIdStream::count` so that it does not try to count beyond max. From the perspective of DISIDocIdStream it should not be necessary to check the iterator position before calling

Re: [PR] OptimisticKnnVectorQuery [lucene]

2025-04-18 Thread via GitHub
msokolov commented on code in PR #14226: URL: https://github.com/apache/lucene/pull/14226#discussion_r2050520231 ## lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java: ## @@ -93,22 +100,62 @@ public Query rewrite(IndexSearcher indexSearcher) throws IOExce

Re: [PR] Refactor doc values to expose a `DocIdSetIterator` instead of extending `DocIdSetIterator`. [lucene]

2025-04-18 Thread via GitHub
gf2121 commented on PR #14475: URL: https://github.com/apache/lucene/pull/14475#issuecomment-2815249828 It occurs to me that logic like: ``` DocValuesIterator iter = DocValuesIterator.of(1, 2, 4); boolean exist = iter.advanceExact(3); assert exist == false; iter.intoBitSet(upto

Re: [PR] Use a non-deprecated assertThat, and change several test assertions to use assertThat [lucene]

2025-04-18 Thread via GitHub
rmuir commented on code in PR #14518: URL: https://github.com/apache/lucene/pull/14518#discussion_r2050487382 ## lucene/core/src/test/org/apache/lucene/document/TestDocument.java: ## @@ -233,7 +245,7 @@ public void testPositionIncrementMultiFields() throws Exception { Phra

Re: [PR] Ensuring skip list is read for fields indexed with only DOCS [lucene]

2025-04-18 Thread via GitHub
expani commented on code in PR #14511: URL: https://github.com/apache/lucene/pull/14511#discussion_r2050377465 ## lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java: ## @@ -282,6 +288,10 @@ public PostingsEnum postings( @Override public Im

Re: [PR] Ensuring skip list is read for fields indexed with only DOCS [lucene]

2025-04-18 Thread via GitHub
expani commented on code in PR #14511: URL: https://github.com/apache/lucene/pull/14511#discussion_r2047829308 ## lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java: ## @@ -1310,7 +1317,7 @@ public List getImpacts(int level) { r

Re: [PR] OptimisticKnnVectorQuery [lucene]

2025-04-18 Thread via GitHub
dungba88 commented on code in PR #14226: URL: https://github.com/apache/lucene/pull/14226#discussion_r2050353724 ## lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java: ## @@ -93,22 +100,62 @@ public Query rewrite(IndexSearcher indexSearcher) throws IOExce

Re: [PR] Add a Faiss codec for KNN searches [lucene]

2025-04-18 Thread via GitHub
kaivalnp commented on PR #14178: URL: https://github.com/apache/lucene/pull/14178#issuecomment-2814861008 > On their website they do indicate if you avoid certain distribution channels you can use miniconda without a paid license. Yeah, I found this at https://www.anaconda.com/pricing

Re: [PR] Upgrade to gradle 8.14-rc-2 [lucene]

2025-04-18 Thread via GitHub
dweiss commented on PR #14519: URL: https://github.com/apache/lucene/pull/14519#issuecomment-2814839279 Don't get me wrong - please feel free to provide a pull request that passes with the rc release, this will make things easier! I just don't think we should merge it until the final releas

Re: [PR] Logic for collecting Histogram efficiently using Point Trees [lucene]

2025-04-18 Thread via GitHub
jainankitk commented on PR #14439: URL: https://github.com/apache/lucene/pull/14439#issuecomment-2814740258 Addressed most of the review comments. Ran small performance benchmark to see the difference, and we can see some even with small number of documents: Bulk time is new approach,

Re: [PR] Use a non-deprecated assertThat, and change several test assertions to use assertThat [lucene]

2025-04-18 Thread via GitHub
dweiss commented on PR #14518: URL: https://github.com/apache/lucene/pull/14518#issuecomment-2814845475 I'll merge this one later today if there are no objections. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR