Re: [PR] Reciprocal Rank Fusion (RRF) in TopDocs [lucene]

2025-02-27 Thread via GitHub


jpountz merged PR #13470:
URL: https://github.com/apache/lucene/pull/13470


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Lack of coverage of DenseConjunctionBulkScorer with min competitive scores and competitive iterators [lucene]

2025-02-27 Thread via GitHub


jpountz closed issue #14283: Lack of coverage of DenseConjunctionBulkScorer 
with min competitive scores and competitive iterators
URL: https://github.com/apache/lucene/issues/14283


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Remove scoreAll() optimization from DefaultBulkScorer. [lucene]

2025-02-27 Thread via GitHub


jpountz merged PR #14039:
URL: https://github.com/apache/lucene/pull/14039


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Bump floor segment size to 16MB. [lucene]

2025-02-27 Thread via GitHub


jpountz merged PR #14189:
URL: https://github.com/apache/lucene/pull/14189


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix TestSysoutLimits by making nested test classes not extend LuceneTestCase [lucene]

2025-02-27 Thread via GitHub


madrob commented on code in PR #14309:
URL: https://github.com/apache/lucene/pull/14309#discussion_r1974288498


##
lucene/test-framework/src/java/org/apache/lucene/tests/util/TestRuleLimitSysouts.java:
##
@@ -207,6 +207,7 @@ protected void before() throws Throwable {
   checkCaptureStreams();
 }
 resetCaptureState();
+var bef = bytesWritten.get();

Review Comment:
   did this sneak in?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Should we auto-adjust top score doc and top field collector manager based on slices? [lucene]

2025-02-27 Thread via GitHub


javanna closed issue #13791: Should we auto-adjust top score doc and top field 
collector manager based on slices?
URL: https://github.com/apache/lucene/issues/13791


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Enhance DictionaryCompoundWordTokenFilter [lucene]

2025-02-27 Thread via GitHub


renatoh commented on PR #14278:
URL: https://github.com/apache/lucene/pull/14278#issuecomment-2688748464

   > Thanks @renatoh !
   
   thanks for your inputs and review it!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] TestSysoutLimits still occasionally failing [lucene]

2025-02-27 Thread via GitHub


dweiss commented on issue #14307:
URL: https://github.com/apache/lucene/issues/14307#issuecomment-2688934137

   This one is caused by a more complex interaction - LuceneTestCase tries to 
set up a random TimeZone and this prints a warning like this:
   ```
   WARNING: Use of the three-letter time zone ID "AET" is deprecated and it 
will be removed in a future release
   ```
   
   I think it'll actually be more beneficial to make the nested tests in 
TestRuleLimitSysouts not extend LuceneTestCase so that there are no randomized 
warnings being printed to std streams. I'll try to provide a patch tomorrow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use DenseConjunctionBulkScorer for single queries sometimes. [lucene]

2025-02-27 Thread via GitHub


jpountz merged PR #14293:
URL: https://github.com/apache/lucene/pull/14293


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] introduce new parameter onlyLongestMatchNoSubwords replacing onlyLongestMatch [lucene]

2025-02-27 Thread via GitHub


renatoh opened a new pull request, #14311:
URL: https://github.com/apache/lucene/pull/14311

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] introduce new parameter onlyLongestMatchNoSubwords replacing onlyLongestMatch [lucene]

2025-02-27 Thread via GitHub


renatoh commented on PR #14311:
URL: https://github.com/apache/lucene/pull/14311#issuecomment-2689203083

   @rmuir onlyLongestMatchNoSubwords is basically what was 
onlyLongestMatch=true + reuseChars=false


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] ExceptionInInitializerError in ScorerUtil [lucene]

2025-02-27 Thread via GitHub


jpountz commented on PR #14280:
URL: https://github.com/apache/lucene/pull/14280#issuecomment-2688597916

   This sounded like a good idea so I applied it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix TestSysoutLimits by making nested test classes not extend LuceneTestCase [lucene]

2025-02-27 Thread via GitHub


dweiss commented on code in PR #14309:
URL: https://github.com/apache/lucene/pull/14309#discussion_r1974964208


##
lucene/test-framework/src/java/org/apache/lucene/tests/util/TestRuleLimitSysouts.java:
##
@@ -207,6 +207,7 @@ protected void before() throws Throwable {
   checkCaptureStreams();
 }
 resetCaptureState();
+var bef = bytesWritten.get();

Review Comment:
   Ugh. Absolutely - debugging artifact. Thank you!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[I] MultiTermQueryConstantScoreBlendedWrapper#createWeight#rewriteInner performance optimization ideas [lucene]

2025-02-27 Thread via GitHub


hanbj opened a new issue, #14313:
URL: https://github.com/apache/lucene/issues/14313

   ### Description
   
   There are many implementations of MultiTermQuery, such as TermInSetQuery 
FuzzyQuery、WildcardQuery、PrefixQuery、TermRangeQuery、RegexpQuery、TermsQuery、AutomatonQuery
 Wait, so optimizing the performance of MultiTermQuery has corresponding 
performance improvements for various queries.
   
   `The default logic is as follows:`
   1. When the number of terms is less than or equal to 16, rewrite it as 
BooleanQuery
   2. When the number of terms is greater than 16, traverse the posting list 
corresponding to each term to collect document numbers
   
   > 2.1. If the document frequency corresponding to this term is less than or 
equal to 512, record the document ID in otherTerms
   
   > 2.2. If the document frequency corresponding to the term is greater than 
512, add the posting list corresponding to the term to the priority queue 
highFrequent Terms
   
   3. Encapsulate the 16 posting lists contained in otherTerms and 
highFrequency Terms into the set subs
   4. Use the Disjunction DISIApproximation wrapper to jointly participate in 
the collection of document numbers during the merging of posting lists
   
   `The optimization idea is as follows:`
   1. Traverse the posting list corresponding to each term and delay 
processing, so that it can be returned in advance when encountering the 
following situations
   
   > 1.1. A term matches all documents
   
   > 1.2. A term matches all documents contained in that field
   
   2. The frequency of documents corresponding to a certain term is very high, 
less than or equal to reader. maxDoc() -4096. When encountering a large posting 
list, reverse collection can be performed. At this time, the posting lists 
corresponding to other terms can be traversed, and the corresponding document 
IDs can be deleted from the reverse collected set. If the reverse collected set 
is empty, it means that all documents are matched and can be returned in 
advance. If it is not empty, the document IDs contained in the reverse 
collection set are also relatively small, and the performance will be fast when 
merging the reverse linked list later
   3. When the term iteration is completed and it is found that the number of 
terms is equal to the number of terms contained in the field, all documents are 
included, and there is no need to traverse the posting list of each term.
   
   I have already implemented this optimization myself and it has been about 
half a year since it was launched in the production environment. Currently, I 
have not found any customer feedback issues, but the code changes are slightly 
significant. Is the Lucene community interested? If so, I will submit a PR.
   The test results are as follows:
   
---
   type 
|   Performance improvement
   
---
   A term match all docs   |
80 times
   
---
   A term matches all documents containing that field  |70 times
   
---
   contains all terms   
|80 times
   
---
   Reverse collection  
|8 times
   
---


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Expose the ImpactsEnum impl in Lucene101PostingsFormat. [lucene]

2025-02-27 Thread via GitHub


uschindler commented on code in PR #14306:
URL: https://github.com/apache/lucene/pull/14306#discussion_r1974361308


##
lucene/core/src/java/org/apache/lucene/codecs/lucene101/Lucene101PostingsFormat.java:
##
@@ -351,6 +352,14 @@ public final class Lucene101PostingsFormat extends 
PostingsFormat {
 
   public static final int LEVEL1_MASK = LEVEL1_NUM_DOCS - 1;
 
+  /**
+   * Return the class that implements {@link ImpactsEnum} in this {@link 
PostingsFormat}. This is
+   * internally used to help the JVM make good inlining decisions.
+   */
+  public static Class getImpactsEnumImpl() {

Review Comment:
   maybe add `@lucene.internal` as javadocs tag.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Bump minimum required Java version to 23 [lucene]

2025-02-27 Thread via GitHub


uschindler commented on PR #14302:
URL: https://github.com/apache/lucene/pull/14302#issuecomment-2689162858

   I stopped the Jenkins builds on Policeman Jenkins and will check to update 
the config for Java 23.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] fix lambdas for java 23 [lucene]

2025-02-27 Thread via GitHub


rmuir commented on PR #14308:
URL: https://github.com/apache/lucene/pull/14308#issuecomment-2689170929

   Thanks, the change is just a bit annoying  from noise perspective. I will 
merge up main first, to make sure there aren't any new lambdas in recent 
commits that anger the check.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Bump minimum required Java version to 23 [lucene]

2025-02-27 Thread via GitHub


uschindler commented on PR #14302:
URL: https://github.com/apache/lucene/pull/14302#issuecomment-2689172018

   I will look into moving the MMapDirectory parts to the main code and only 
leave the vector stuff in the APIJAR special case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Bump minimum required Java version to 23 [lucene]

2025-02-27 Thread via GitHub


uschindler commented on PR #14302:
URL: https://github.com/apache/lucene/pull/14302#issuecomment-2689174826

   We may now also remove SecurityManager and AccessController everywhere in 
main branch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Bump minimum required Java version to 23 [lucene]

2025-02-27 Thread via GitHub


madrob commented on code in PR #14302:
URL: https://github.com/apache/lucene/pull/14302#discussion_r1974374898


##
lucene/CHANGES.txt:
##
@@ -45,7 +45,8 @@ Bug Fixes
 
 Other
 -
-(No changes)
+
+* GITHUB#9: Bump minimum required Java version to 23

Review Comment:
   This needed to be updated post issue creation



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Enhance DictionaryCompoundWordTokenFilter [lucene]

2025-02-27 Thread via GitHub


rmuir commented on PR #14278:
URL: https://github.com/apache/lucene/pull/14278#issuecomment-2688769041

   @renatoh Feel free to open another PR, if you have time, to try to improve 
defaults around this for the next version of lucene. If i ask for "longest 
match" I don't expect to have additional "shorter" subwords coming out of the 
analyzer, so it seems like a good improvement.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Lack of coverage of DenseConjunctionBulkScorer with min competitive scores and competitive iterators [lucene]

2025-02-27 Thread via GitHub


jpountz closed issue #14283: Lack of coverage of DenseConjunctionBulkScorer 
with min competitive scores and competitive iterators
URL: https://github.com/apache/lucene/issues/14283


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Make Lucene better at skipping long runs of matches. [lucene]

2025-02-27 Thread via GitHub


jpountz opened a new pull request, #14312:
URL: https://github.com/apache/lucene/pull/14312

   This is an attempt to resurrect #12194 in a (hopefully) better way. Now that 
many queries run with `DenseConjunctionBulkScorer`, which scores windows of doc 
IDs at a time, it becomes natural to skip clauses that have long runs of 
matches by checking if they match the whole window.
   
   This introduces the same `DocIdSetIterator#peekNextNonMatchingDocID()` API 
that PR #12194 suggested, implements it in `DocIdSetIterator#all`, and uses it 
in `DenseConjunctionBulkScorer` to skip clauses that match the whole window.
   
   For better test coverage, `DenseConjunctionBulkScorer` was refactored to 
require at least one iterator, which can be a `DocIdSetIterator#all` instance 
if all docs match.
   
   In follow-ups, we should look into supporting other queries that are likely 
to have long runs of matches, in particular doc-value range queries on fields 
that are part of the index sort and take advantage of a doc-value skipper.
   
   Closes #11915


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Recommend multi-stage retrieval pipelines in oal.search javadocs. [lucene]

2025-02-27 Thread via GitHub


jpountz merged PR #14310:
URL: https://github.com/apache/lucene/pull/14310


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Make Lucene better at skipping long runs of matches. [lucene]

2025-02-27 Thread via GitHub


jpountz commented on PR #14312:
URL: https://github.com/apache/lucene/pull/14312#issuecomment-2689215958

   cc @gf2121 who's been reviewing related PRs recently and @iverase for the 
connection with sparse indexing


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] fix lambdas for java 23 [lucene]

2025-02-27 Thread via GitHub


rmuir opened a new pull request, #14308:
URL: https://github.com/apache/lucene/pull/14308

   After the upgrade to java 23, my editor is flooded with warnings of unused 
variables from lambdas. Fix them.
   
   I also downloaded eclipse, installed it, checked all possible compiler 
options, compared the resulting file and ensured out eclipse compiler file is 
synced up, so that everything is explicit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Enhance DictionaryCompoundWordTokenFilter [lucene]

2025-02-27 Thread via GitHub


renatoh commented on PR #14278:
URL: https://github.com/apache/lucene/pull/14278#issuecomment-2688845145

   @rmuir could we reduce it to only two 'valid' behavior: 
onlyLongestMatch=true  with reuseChars=false and onlyLongestMatch=false with 
reuseChars=true. if we think only these two cases make sense, we could actually 
reduce it to one flag and try to come up with a different name for that only 
flag. or should onlyLongestMatch=true  with reuseChars=true, the default 
behavior of today, also be an option?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Expose the ImpactsEnum impl in Lucene101PostingsFormat. [lucene]

2025-02-27 Thread via GitHub


jpountz merged PR #14306:
URL: https://github.com/apache/lucene/pull/14306


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] TestScorerUtil.testLikelyImpactsEnum fails [lucene]

2025-02-27 Thread via GitHub


jpountz closed issue #14303: TestScorerUtil.testLikelyImpactsEnum fails
URL: https://github.com/apache/lucene/issues/14303


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Expose the ImpactsEnum impl in Lucene101PostingsFormat. [lucene]

2025-02-27 Thread via GitHub


jpountz commented on PR #14306:
URL: https://github.com/apache/lucene/pull/14306#issuecomment-2689030846

   I went ahead and merged to step the stream of failures. Happy to revisit the 
approach in a follow-up if there are concerns.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Bump minimum required Java version to 23 [lucene]

2025-02-27 Thread via GitHub


reta commented on code in PR #14302:
URL: https://github.com/apache/lucene/pull/14302#discussion_r1974531053


##
lucene/core/src/java/org/apache/lucene/index/StandardDirectoryReader.java:
##
@@ -476,7 +476,7 @@ protected void doClose() throws IOException {
 }
   }
 };
-try (Closeable finalizer = decRefDeleter) {
+try (var _ = decRefDeleter) {

Review Comment:
   I think `_` could be completely omitted here:
   
   ```
   try (decRefDeleter) {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Bump minimum required Java version to 23 [lucene]

2025-02-27 Thread via GitHub


reta commented on code in PR #14302:
URL: https://github.com/apache/lucene/pull/14302#discussion_r1974531053


##
lucene/core/src/java/org/apache/lucene/index/StandardDirectoryReader.java:
##
@@ -476,7 +476,7 @@ protected void doClose() throws IOException {
 }
   }
 };
-try (Closeable finalizer = decRefDeleter) {
+try (var _ = decRefDeleter) {

Review Comment:
   I think `_` could be completely omitted here:
   
   ```
   try (decRefDeleter) {
   ```
   
   sorry late for review



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Create vectorized versions of ScalarQuantizer.quantize and recalculateCorrectiveOffset [lucene]

2025-02-27 Thread via GitHub


jpountz commented on PR #14304:
URL: https://github.com/apache/lucene/pull/14304#issuecomment-2689117193

   Have you been able to run `luceneutil` to get a sense of the indexing and 
search speedups?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Reciprocal Rank Fusion (RRF) in TopDocs [lucene]

2025-02-27 Thread via GitHub


jpountz commented on PR #13470:
URL: https://github.com/apache/lucene/pull/13470#issuecomment-2689114469

   > I have a bias for the latter, as I was planning on improving the docs of 
the oal.search package as a follow-up to provide guidance wrt how to do hybrid 
search by linking to this RRF helper.
   
   I opened #14310.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] ExceptionInInitializerError in ScorerUtil [lucene]

2025-02-27 Thread via GitHub


jpountz commented on PR #14280:
URL: https://github.com/apache/lucene/pull/14280#issuecomment-2689123306

   I merged my other PR, which should supersede this one. Closing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Recommend multi-stage retrieval pipelines in oal.search javadocs. [lucene]

2025-02-27 Thread via GitHub


jpountz commented on code in PR #14310:
URL: https://github.com/apache/lucene/pull/14310#discussion_r1974344510


##
lucene/core/src/java/org/apache/lucene/search/package-info.java:
##
@@ -350,6 +350,40 @@
  *
  * 
  *
+ * Multi-stage retrieval pipelines
+ *
+ * The above explains how to influence the score when evaluating all 
matches of the query. This
+ * is expensive by design since it applies to all matches of the query, which 
could be millions. In
+ * order to apply more sophisticated ranking logic, a good approach consists 
of having a retrieval
+ * pipeline that runs a simple candidate retrieval stage that retrieves e.g. 
1,000 hits, followed by
+ * a more sophisticated reranking stage that reranks these 1,000 hits to 
select the best 100 hits
+ * among them. Since the number of hits that this retrieval stage needs to 
operate on is bounded, it
+ * allows it to be more sophisticated.
+ *
+ * Lucene exposes reranking via the {@link 
org.apache.lucene.search.Rescorer} abstract class,
+ * which has two main sub-classes:
+ *
+ * 
+ *   {@link org.apache.lucene.search.QueryRescorer}, to rescore using a 
query. For instance, the
+ *   query string could be parsed as phrase query using {@link
+ *   org.apache.lucene.util.QueryBuilder#createPhraseQuery} instead of a 
boolean query in order
+ *   to help boost hits which also match the query string as a phrase.
+ *   {@link org.apache.lucene.search.SortRescorer}, to rescore using a 
{@link
+ *   org.apache.lucene.search.Sort}. For instance, the best 1,000 hits by 
BM25 score may be
+ *   sorted by descending popularity in order to compute the final top-100 
hits.
+ * 
+ *
+ * Top hits fusion
+ *
+ * Sometimes, multiple retrieval pipelines may make sense, having their own 
pros and cons. A
+ * typical example would be a lexical retrieval pipeline, matching exactly 
what the user requested,
+ * and a semantic retrieval pipeline, matching documents that are closest to 
the user's query from a
+ * semantic perspective. Combining scores is hazardous as different retrieval 
pipelines often
+ * produce scores that not only have different ranges, but also different 
distributions within this
+ * range. A robust way of combining multiple retrieval pipelines consists of 
combining the top hits
+ * that they produce through their ranks rather thank through their scores 
using reciprocal rank

Review Comment:
   Woops, thanks for catching.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Recommend multi-stage retrieval pipelines in oal.search javadocs. [lucene]

2025-02-27 Thread via GitHub


rmuir commented on code in PR #14310:
URL: https://github.com/apache/lucene/pull/14310#discussion_r1974340780


##
lucene/core/src/java/org/apache/lucene/search/package-info.java:
##
@@ -350,6 +350,40 @@
  *
  * 
  *
+ * Multi-stage retrieval pipelines
+ *
+ * The above explains how to influence the score when evaluating all 
matches of the query. This
+ * is expensive by design since it applies to all matches of the query, which 
could be millions. In
+ * order to apply more sophisticated ranking logic, a good approach consists 
of having a retrieval
+ * pipeline that runs a simple candidate retrieval stage that retrieves e.g. 
1,000 hits, followed by
+ * a more sophisticated reranking stage that reranks these 1,000 hits to 
select the best 100 hits
+ * among them. Since the number of hits that this retrieval stage needs to 
operate on is bounded, it
+ * allows it to be more sophisticated.
+ *
+ * Lucene exposes reranking via the {@link 
org.apache.lucene.search.Rescorer} abstract class,
+ * which has two main sub-classes:
+ *
+ * 
+ *   {@link org.apache.lucene.search.QueryRescorer}, to rescore using a 
query. For instance, the
+ *   query string could be parsed as phrase query using {@link
+ *   org.apache.lucene.util.QueryBuilder#createPhraseQuery} instead of a 
boolean query in order
+ *   to help boost hits which also match the query string as a phrase.
+ *   {@link org.apache.lucene.search.SortRescorer}, to rescore using a 
{@link
+ *   org.apache.lucene.search.Sort}. For instance, the best 1,000 hits by 
BM25 score may be
+ *   sorted by descending popularity in order to compute the final top-100 
hits.
+ * 
+ *
+ * Top hits fusion
+ *
+ * Sometimes, multiple retrieval pipelines may make sense, having their own 
pros and cons. A
+ * typical example would be a lexical retrieval pipeline, matching exactly 
what the user requested,
+ * and a semantic retrieval pipeline, matching documents that are closest to 
the user's query from a
+ * semantic perspective. Combining scores is hazardous as different retrieval 
pipelines often
+ * produce scores that not only have different ranges, but also different 
distributions within this
+ * range. A robust way of combining multiple retrieval pipelines consists of 
combining the top hits
+ * that they produce through their ranks rather thank through their scores 
using reciprocal rank

Review Comment:
   ```suggestion
* that they produce through their ranks rather than through their scores 
using reciprocal rank
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Recommend multi-stage retrieval pipelines in oal.search javadocs. [lucene]

2025-02-27 Thread via GitHub


rmuir commented on code in PR #14310:
URL: https://github.com/apache/lucene/pull/14310#discussion_r1974350732


##
lucene/core/src/java/org/apache/lucene/search/package-info.java:
##
@@ -350,6 +350,40 @@
  *
  * 
  *
+ * Multi-stage retrieval pipelines
+ *
+ * The above explains how to influence the score when evaluating all 
matches of the query. This
+ * is expensive by design since it applies to all matches of the query, which 
could be millions. In
+ * order to apply more sophisticated ranking logic, a good approach consists 
of having a retrieval
+ * pipeline that runs a simple candidate retrieval stage that retrieves e.g. 
1,000 hits, followed by
+ * a more sophisticated reranking stage that reranks these 1,000 hits to 
select the best 100 hits
+ * among them. Since the number of hits that this retrieval stage needs to 
operate on is bounded, it
+ * allows it to be more sophisticated.
+ *
+ * Lucene exposes reranking via the {@link 
org.apache.lucene.search.Rescorer} abstract class,
+ * which has two main sub-classes:
+ *
+ * 
+ *   {@link org.apache.lucene.search.QueryRescorer}, to rescore using a 
query. For instance, the
+ *   query string could be parsed as phrase query using {@link
+ *   org.apache.lucene.util.QueryBuilder#createPhraseQuery} instead of a 
boolean query in order
+ *   to help boost hits which also match the query string as a phrase.
+ *   {@link org.apache.lucene.search.SortRescorer}, to rescore using a 
{@link
+ *   org.apache.lucene.search.Sort}. For instance, the best 1,000 hits by 
BM25 score may be
+ *   sorted by descending popularity in order to compute the final top-100 
hits.
+ * 
+ *
+ * Top hits fusion
+ *
+ * Sometimes, multiple retrieval pipelines may make sense, having their own 
pros and cons. A
+ * typical example would be a lexical retrieval pipeline, matching exactly 
what the user requested,
+ * and a semantic retrieval pipeline, matching documents that are closest to 
the user's query from a
+ * semantic perspective. Combining scores is hazardous as different retrieval 
pipelines often
+ * produce scores that not only have different ranges, but also different 
distributions within this
+ * range. A robust way of combining multiple retrieval pipelines consists of 
combining the top hits
+ * that they produce through their ranks rather than through their scores 
using reciprocal rank
+ * fusion. This is exposed via {@link 
org.apache.lucene.search.TopDocs#rrf(int, int, TopDocs[])}.

Review Comment:
   ```suggestion
* fusion. This is exposed via {@link 
org.apache.lucene.search.TopDocs#rrf(int topN, int k, TopDocs[] hits)}.
   ```
   it is at least legal to do this, and can significantly include readability, 
since types aren't always enough to understand the parameters when reading. 
   
   As far as the auto-generated label that `javadoc` tool makes from it, you'd 
have to test it out. of course that can always be specified, but maybe this is 
an easier approach.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] ExceptionInInitializerError in ScorerUtil [lucene]

2025-02-27 Thread via GitHub


jpountz closed pull request #14280: ExceptionInInitializerError in ScorerUtil
URL: https://github.com/apache/lucene/pull/14280


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Improve documentation for org.apache.lucene.search Sort class [lucene]

2025-02-27 Thread via GitHub


jpountz commented on issue #14295:
URL: https://github.com/apache/lucene/issues/14295#issuecomment-2689221399

   FWIW I recently updated this page with this new link 
https://github.com/apache/lucene/pull/14251/files#diff-0a8bc8e8ffb40f26815f92ad02188c457ddd7594c4ac06208e8f5376ffed3cfbR213.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] fix lambdas for java 23 [lucene]

2025-02-27 Thread via GitHub


rmuir merged PR #14308:
URL: https://github.com/apache/lucene/pull/14308


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Recommend multi-stage retrieval pipelines in oal.search javadocs. [lucene]

2025-02-27 Thread via GitHub


jpountz commented on code in PR #14310:
URL: https://github.com/apache/lucene/pull/14310#discussion_r1974354077


##
lucene/core/src/java/org/apache/lucene/search/package-info.java:
##
@@ -350,6 +350,40 @@
  *
  * 
  *
+ * Multi-stage retrieval pipelines
+ *
+ * The above explains how to influence the score when evaluating all 
matches of the query. This
+ * is expensive by design since it applies to all matches of the query, which 
could be millions. In
+ * order to apply more sophisticated ranking logic, a good approach consists 
of having a retrieval
+ * pipeline that runs a simple candidate retrieval stage that retrieves e.g. 
1,000 hits, followed by
+ * a more sophisticated reranking stage that reranks these 1,000 hits to 
select the best 100 hits
+ * among them. Since the number of hits that this retrieval stage needs to 
operate on is bounded, it
+ * allows it to be more sophisticated.
+ *
+ * Lucene exposes reranking via the {@link 
org.apache.lucene.search.Rescorer} abstract class,
+ * which has two main sub-classes:
+ *
+ * 
+ *   {@link org.apache.lucene.search.QueryRescorer}, to rescore using a 
query. For instance, the
+ *   query string could be parsed as phrase query using {@link
+ *   org.apache.lucene.util.QueryBuilder#createPhraseQuery} instead of a 
boolean query in order
+ *   to help boost hits which also match the query string as a phrase.
+ *   {@link org.apache.lucene.search.SortRescorer}, to rescore using a 
{@link
+ *   org.apache.lucene.search.Sort}. For instance, the best 1,000 hits by 
BM25 score may be
+ *   sorted by descending popularity in order to compute the final top-100 
hits.
+ * 
+ *
+ * Top hits fusion
+ *
+ * Sometimes, multiple retrieval pipelines may make sense, having their own 
pros and cons. A
+ * typical example would be a lexical retrieval pipeline, matching exactly 
what the user requested,
+ * and a semantic retrieval pipeline, matching documents that are closest to 
the user's query from a
+ * semantic perspective. Combining scores is hazardous as different retrieval 
pipelines often
+ * produce scores that not only have different ranges, but also different 
distributions within this
+ * range. A robust way of combining multiple retrieval pipelines consists of 
combining the top hits
+ * that they produce through their ranks rather than through their scores 
using reciprocal rank
+ * fusion. This is exposed via {@link 
org.apache.lucene.search.TopDocs#rrf(int, int, TopDocs[])}.

Review Comment:
   Ah, thanks, I didn't even know that one could do this. I just checked the 
generated javadocs, they look good with these parameter names.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Bump minimum required Java version to 23 [lucene]

2025-02-27 Thread via GitHub


dweiss commented on code in PR #14302:
URL: https://github.com/apache/lucene/pull/14302#discussion_r1973185969


##
lucene/core/src/java/org/apache/lucene/index/IndexReader.java:
##
@@ -253,8 +253,10 @@ public final void decRef() throws IOException {
 final int rc = refCount.decrementAndGet();
 if (rc == 0) {
   closed = true;
-  try (Closeable finalizer = this::reportCloseToParentReaders;
-  Closeable finalizer1 = this::notifyReaderClosedListeners) {
+  try (@SuppressWarnings("unused")
+  Closeable finalizer = this::reportCloseToParentReaders;
+  @SuppressWarnings("unused")
+  Closeable finalizer1 = this::notifyReaderClosedListeners) {

Review Comment:
   Eh. This seems like an issue with ECJ since it's clearly not an unused thing 
if it's in a try-with-resources...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Avoid unnecessary evaluations and skipping documents [lucene]

2025-02-27 Thread via GitHub


hanbj commented on PR #14301:
URL: https://github.com/apache/lucene/pull/14301#issuecomment-2687359672

   Thank you for the review. Changes have been added.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Bump minimum required Java version to 23 [lucene]

2025-02-27 Thread via GitHub


dweiss commented on code in PR #14302:
URL: https://github.com/apache/lucene/pull/14302#discussion_r1973191532


##
lucene/core/src/java/org/apache/lucene/index/IndexReader.java:
##
@@ -253,8 +253,10 @@ public final void decRef() throws IOException {
 final int rc = refCount.decrementAndGet();
 if (rc == 0) {
   closed = true;
-  try (Closeable finalizer = this::reportCloseToParentReaders;
-  Closeable finalizer1 = this::notifyReaderClosedListeners) {
+  try (@SuppressWarnings("unused")
+  Closeable finalizer = this::reportCloseToParentReaders;
+  @SuppressWarnings("unused")
+  Closeable finalizer1 = this::notifyReaderClosedListeners) {

Review Comment:
   Relevant discussion here, for example.
   https://bugs.eclipse.org/bugs/show_bug.cgi?id=560733



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Bump minimum required Java version to 23 [lucene]

2025-02-27 Thread via GitHub


rmuir commented on code in PR #14302:
URL: https://github.com/apache/lucene/pull/14302#discussion_r1973201491


##
lucene/core/src/java/org/apache/lucene/index/IndexReader.java:
##
@@ -253,8 +253,10 @@ public final void decRef() throws IOException {
 final int rc = refCount.decrementAndGet();
 if (rc == 0) {
   closed = true;
-  try (Closeable finalizer = this::reportCloseToParentReaders;
-  Closeable finalizer1 = this::notifyReaderClosedListeners) {
+  try (@SuppressWarnings("unused")
+  Closeable finalizer = this::reportCloseToParentReaders;
+  @SuppressWarnings("unused")
+  Closeable finalizer1 = this::notifyReaderClosedListeners) {

Review Comment:
   I've got warnings for unused variables of lambdas that are new, showing up 
from eclipse language server with suggestion to "rename to unnamed variable". 
These look legit, but are new.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Bump minimum required Java version to 23 [lucene]

2025-02-27 Thread via GitHub


ChrisHegarty commented on code in PR #14302:
URL: https://github.com/apache/lucene/pull/14302#discussion_r1973205040


##
lucene/core/src/java/org/apache/lucene/index/IndexReader.java:
##
@@ -253,8 +253,10 @@ public final void decRef() throws IOException {
 final int rc = refCount.decrementAndGet();
 if (rc == 0) {
   closed = true;
-  try (Closeable finalizer = this::reportCloseToParentReaders;
-  Closeable finalizer1 = this::notifyReaderClosedListeners) {
+  try (@SuppressWarnings("unused")
+  Closeable finalizer = this::reportCloseToParentReaders;
+  @SuppressWarnings("unused")
+  Closeable finalizer1 = this::notifyReaderClosedListeners) {

Review Comment:
   Now we're on >22 we can just used unnamed variables, which has the nice 
property to keep the checker happy! 
[ad54196](https://github.com/apache/lucene/pull/14302/commits/ad54196bd2dd56bbe9b30ff2f0db94258c5ef02a)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Bump minimum required Java version to 23 [lucene]

2025-02-27 Thread via GitHub


rmuir commented on code in PR #14302:
URL: https://github.com/apache/lucene/pull/14302#discussion_r1973206235


##
lucene/core/src/java/org/apache/lucene/index/IndexReader.java:
##
@@ -253,8 +253,10 @@ public final void decRef() throws IOException {
 final int rc = refCount.decrementAndGet();
 if (rc == 0) {
   closed = true;
-  try (Closeable finalizer = this::reportCloseToParentReaders;
-  Closeable finalizer1 = this::notifyReaderClosedListeners) {
+  try (@SuppressWarnings("unused")
+  Closeable finalizer = this::reportCloseToParentReaders;
+  @SuppressWarnings("unused")
+  Closeable finalizer1 = this::notifyReaderClosedListeners) {

Review Comment:
   Example of what i'm seeing. I can take care in a followup PR if it does not 
fail build for now.
   ```
   diff --git a/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java 
b/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
   index a8eca64c962..0f7693eedd1 100644
   --- a/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
   +++ b/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
   @@ -,7 +,7 @@ public class IndexWriter
infoStream.message("IW", "now abort pending addIndexes 
merge");
  }
  merge.setAborted();
   -  merge.close(false, false, mr -> {});
   +  merge.close(false, false, _ -> {});
  onMergeFinished(merge);
});
pendingAddIndexesMerges.clear();
   @@ -3350,7 +3350,7 @@ public class IndexWriter
handleMergeException(t, merge);
  } finally {
synchronized (IndexWriter.this) {
   -  merge.close(success, false, mr -> {});
   +  merge.close(success, false, _ -> {});
  onMergeFinished(merge);
}
  }
   @@ -3731,7 +3731,7 @@ public class IndexWriter
// necessary files to disk and checkpointed them.
pointInTimeMerges =
preparePointInTimeMerge(
   -toCommit, stopAddingMergedSegments::get, 
MergeTrigger.COMMIT, sci -> {});
   +toCommit, stopAddingMergedSegments::get, 
MergeTrigger.COMMIT, _ -> {});
  }
}
success = true;
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Bump minimum required Java version to 23 [lucene]

2025-02-27 Thread via GitHub


rmuir commented on code in PR #14302:
URL: https://github.com/apache/lucene/pull/14302#discussion_r1973219365


##
lucene/core/src/java/org/apache/lucene/index/IndexReader.java:
##
@@ -253,8 +253,10 @@ public final void decRef() throws IOException {
 final int rc = refCount.decrementAndGet();
 if (rc == 0) {
   closed = true;
-  try (Closeable finalizer = this::reportCloseToParentReaders;
-  Closeable finalizer1 = this::notifyReaderClosedListeners) {
+  try (@SuppressWarnings("unused")
+  Closeable finalizer = this::reportCloseToParentReaders;
+  @SuppressWarnings("unused")
+  Closeable finalizer1 = this::notifyReaderClosedListeners) {

Review Comment:
   Maybe it never warned before, because it was impossible to fix with java 21? 
my version of eclipse.jdt.ls is unchanged.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Bump minimum required Java version to 23 [lucene]

2025-02-27 Thread via GitHub


ChrisHegarty commented on code in PR #14302:
URL: https://github.com/apache/lucene/pull/14302#discussion_r1973227461


##
lucene/core/src/java/org/apache/lucene/index/IndexReader.java:
##
@@ -253,8 +253,10 @@ public final void decRef() throws IOException {
 final int rc = refCount.decrementAndGet();
 if (rc == 0) {
   closed = true;
-  try (Closeable finalizer = this::reportCloseToParentReaders;
-  Closeable finalizer1 = this::notifyReaderClosedListeners) {
+  try (@SuppressWarnings("unused")
+  Closeable finalizer = this::reportCloseToParentReaders;
+  @SuppressWarnings("unused")
+  Closeable finalizer1 = this::notifyReaderClosedListeners) {

Review Comment:
   Yeah, that seem the most likely reason.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Bump minimum required Java version to 23 [lucene]

2025-02-27 Thread via GitHub


rmuir commented on PR #14302:
URL: https://github.com/apache/lucene/pull/14302#issuecomment-2687423920

   test failure on mac looked like a good ole flaky test to me, I re-reran.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Bump minimum required Java version to 23 [lucene]

2025-02-27 Thread via GitHub


rmuir commented on PR #14302:
URL: https://github.com/apache/lucene/pull/14302#issuecomment-2687466537

   The same test failed again, this time on windows. 
   
   To me at a glance, this looks to be unrelated issue caused by #14294 
   
   cc @jpountz 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Bump minimum required Java version to 23 [lucene]

2025-02-27 Thread via GitHub


ChrisHegarty commented on PR #14302:
URL: https://github.com/apache/lucene/pull/14302#issuecomment-2687487855

   > The same test failed again, this time on windows.
   > 
   > To me at a glance, this looks to be unrelated issue caused by #14294
   
   ha! I had to merge main, since I didn't have this test in the branch! I can 
reproduce it now. Going to file a separate issue and mute the test, for now.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[I] TestScorerUtil.testLikelyImpactsEnum fails [lucene]

2025-02-27 Thread via GitHub


ChrisHegarty opened a new issue, #14303:
URL: https://github.com/apache/lucene/issues/14303

   ### Description
   
   ```
   Reproduce with: gradlew :lucene:core:test --tests \
   "org.apache.lucene.search.TestScorerUtil.testLikelyImpactsEnum" \
   -Ptests.jvms=1 \
   -Ptests.jvmargs= \
   -Ptests.seed=20E0C716D3EB48E7 \
   -Ptests.useSecurityManager=true \
   -Ptests.gui=true \
   -Ptests.file.encoding=UTF-8 \
   -Ptests.vectorsize=512 -Ptests.forceintegervectors=true
   ```
   
   ```
   TestScorerUtil > testLikelyImpactsEnum FAILED
   java.lang.AssertionError: expected 
same:
 was not:
   at 
__randomizedtesting.SeedInfo.seed([20E0C716D3EB48E7:AC95CC1E9285FC36]:0)
   at org.junit.Assert.fail(Assert.java:89)
   at org.junit.Assert.failNotSame(Assert.java:829)
   at org.junit.Assert.assertSame(Assert.java:772)
   at org.junit.Assert.assertSame(Assert.java:783)
   at 
org.apache.lucene.search.TestScorerUtil.testLikelyImpactsEnum(TestScorerUtil.java:91)
   at 
java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
   at java.base/java.lang.reflect.Method.invoke(Method.java:580)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
   at 
org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
   at 
org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
   at 
org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
   at 
org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
   at 
org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
   at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
   at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
   at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
   at 
org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
   at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   at 
org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
   at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
   at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
   at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   at 
org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
   at 
org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
   at 
org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
   at 
org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
   at 
org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
   at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
   at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
   at java.base/

Re: [I] TestScorerUtil.testLikelyImpactsEnum fails [lucene]

2025-02-27 Thread via GitHub


ChrisHegarty commented on issue #14303:
URL: https://github.com/apache/lucene/issues/14303#issuecomment-2687502442

   The test was introduced by #14294 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] ExceptionInInitializerError in ScorerUtil [lucene]

2025-02-27 Thread via GitHub


uschindler commented on PR #14280:
URL: https://github.com/apache/lucene/pull/14280#issuecomment-2688104449

   This is better. But best would be to add a getter for the class in our 
internal package (like we do for other stuff). The usual SharedSecrets approach.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] ExceptionInInitializerError in ScorerUtil [lucene]

2025-02-27 Thread via GitHub


uschindler commented on PR #14280:
URL: https://github.com/apache/lucene/pull/14280#issuecomment-2688112479

   But this is fine. We currently only have shared secrets for tests.
   Making that class public does not hurt.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add posTagFormat parameter for OpenNLPPOSFilter [lucene]

2025-02-27 Thread via GitHub


epugh commented on PR #14194:
URL: https://github.com/apache/lucene/pull/14194#issuecomment-2688311636

   @cpoerschke (and anyone else) I haven't done a commit on Lucene in a long 
time so I want to get another set of eyes on this..And I need to remember 
what all is in the workflow as well ;-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Should we auto-adjust top score doc and top field collector manager based on slices? [lucene]

2025-02-27 Thread via GitHub


javanna commented on issue #13791:
URL: https://github.com/apache/lucene/issues/13791#issuecomment-2688326822

   I will go ahead and close this. The supportsConcurrency flag has been 
removed. The collector manager no longer gives a choice and users don't have to 
think about it either.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Support JDK 24 in Panama Vectorization Provider [lucene]

2025-02-27 Thread via GitHub


ChrisHegarty merged PR #14300:
URL: https://github.com/apache/lucene/pull/14300


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix optimization to help inline calls to live docs. [lucene]

2025-02-27 Thread via GitHub


jpountz merged PR #14294:
URL: https://github.com/apache/lucene/pull/14294


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use DenseConjunctionBulkScorer for single queries sometimes. [lucene]

2025-02-27 Thread via GitHub


jpountz commented on code in PR #14293:
URL: https://github.com/apache/lucene/pull/14293#discussion_r1973119540


##
lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java:
##
@@ -341,11 +341,10 @@ public ScorerSupplier scorerSupplier(LeafReaderContext 
context) throws IOExcepti
 
 if (allDocsMatch) {
   // all docs have a value and all points are within bounds, so 
everything matches
-  return new ScorerSupplier() {
+  return new ConstantScoreScorerSupplier(score(), scoreMode, 
reader.maxDoc()) {

Review Comment:
   Oops, yes, fixing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Support JDK 24 [lucene]

2025-02-27 Thread via GitHub


ChrisHegarty commented on issue #14184:
URL: https://github.com/apache/lucene/issues/14184#issuecomment-2687262361

   The JDK 24 RC builds are tested with Lucene very frequently [1], and there 
are no observable issues.
   
   [1] https://jenkins.thetaphi.de/view/Lucene/job/Lucene-main-Linux/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Support JDK 24 [lucene]

2025-02-27 Thread via GitHub


ChrisHegarty closed issue #14184: Support JDK 24
URL: https://github.com/apache/lucene/issues/14184


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use DenseConjunctionBulkScorer for single queries sometimes. [lucene]

2025-02-27 Thread via GitHub


jpountz commented on code in PR #14293:
URL: https://github.com/apache/lucene/pull/14293#discussion_r1973122492


##
lucene/core/src/java/org/apache/lucene/search/TermQuery.java:
##
@@ -165,6 +165,17 @@ public Scorer get(long leadCost) throws IOException {
   }
 }
 
+@Override
+public BulkScorer bulkScorer() throws IOException {
+  if (scoreMode.needsScores() == false) {
+DocIdSetIterator iterator = get(Long.MAX_VALUE).iterator();
+int maxDoc = context.reader().maxDoc();
+return ConstantScoreScorerSupplier.fromIterator(iterator, 0f, 
scoreMode, maxDoc)

Review Comment:
   I think so, but I plan on having a potentially better way of doing it by 
reviving #12194 as a follow-up and skipping clauses that fully match a window 
(which would also remove the need for `MatchAllScorerSupplier`).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] ExceptionInInitializerError in ScorerUtil [lucene]

2025-02-27 Thread via GitHub


jpountz commented on PR #14280:
URL: https://github.com/apache/lucene/pull/14280#issuecomment-2687278510

   One concern I have with this change is that this code now runs as part of 
evaluating a query, when users care about query latency.
   
   @uschindler You usually have informed opinions on this sort of things, I 
wonder if you have thoughts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Remove scoreAll() optimization from DefaultBulkScorer. [lucene]

2025-02-27 Thread via GitHub


jpountz commented on code in PR #14039:
URL: https://github.com/apache/lucene/pull/14039#discussion_r1973145270


##
lucene/core/src/java/org/apache/lucene/search/Weight.java:
##
@@ -289,75 +262,108 @@ static int scoreRange(
 }
   }
 
-  int doc = iterator.docID();
-  if (doc < min) {
-if (doc == min - 1) {
-  doc = iterator.nextDoc();
+  if (iterator.docID() < min) {
+if (iterator.docID() == min - 1) {
+  iterator.nextDoc();
 } else {
-  doc = iterator.advance(min);
+  iterator.advance(min);
 }
   }
 
+  // These various specializations help save some null checks in a hot 
loop, but as importantly
+  // if not more importantly, they help reduce the polymorphism of calls 
sites to nextDoc() and
+  // collect() because only a subset of collectors produce a competitive 
iterator, and the set
+  // of implementing classes for two-phase approximations is smaller than 
the set of doc id set
+  // iterator implementations.
   if (twoPhase == null && competitiveIterator == null) {
 // Optimize simple iterators with collectors that can't skip
-while (doc < max) {
-  if (acceptDocs == null || acceptDocs.get(doc)) {
-collector.collect(doc);
-  }
-  doc = iterator.nextDoc();
-}
+scoreIterator(collector, acceptDocs, iterator, max);
+  } else if (competitiveIterator == null) {
+scoreTwoPhaseIterator(collector, acceptDocs, iterator, twoPhase, max);
+  } else if (twoPhase == null) {
+scoreCompetitiveIterator(collector, acceptDocs, iterator, 
competitiveIterator, max);
   } else {
-while (doc < max) {
-  if (competitiveIterator != null) {
-assert competitiveIterator.docID() <= doc;
-if (competitiveIterator.docID() < doc) {
-  competitiveIterator.advance(doc);
-}
-if (competitiveIterator.docID() != doc) {
-  doc = iterator.advance(competitiveIterator.docID());
-  continue;
-}
-  }
+scoreTwoPhaseOrCompetitiveIterator(
+collector, acceptDocs, iterator, twoPhase, competitiveIterator, 
max);
+  }
 
-  if ((acceptDocs == null || acceptDocs.get(doc))
-  && (twoPhase == null || twoPhase.matches())) {
-collector.collect(doc);
-  }
-  doc = iterator.nextDoc();
+  return iterator.docID();
+}
+
+private static void scoreIterator(
+LeafCollector collector, Bits acceptDocs, DocIdSetIterator iterator, 
int max)
+throws IOException {
+  for (int doc = iterator.docID(); doc < max; doc = iterator.nextDoc()) {
+if (acceptDocs == null || acceptDocs.get(doc)) {
+  collector.collect(doc);
 }
   }
-
-  return doc;
 }
 
-/**
- * Specialized method to bulk-score all hits; we separate this from {@link 
#scoreRange} to help
- * out hotspot. See https://issues.apache.org/jira/browse/LUCENE-5487";>LUCENE-5487
- */
-static void scoreAll(
+private static void scoreTwoPhaseIterator(
 LeafCollector collector,
+Bits acceptDocs,
 DocIdSetIterator iterator,
 TwoPhaseIterator twoPhase,
-Bits acceptDocs)
+int max)
 throws IOException {
-  if (twoPhase == null) {
-for (int doc = iterator.nextDoc();
-doc != DocIdSetIterator.NO_MORE_DOCS;
-doc = iterator.nextDoc()) {
-  if (acceptDocs == null || acceptDocs.get(doc)) {
-collector.collect(doc);
+  for (int doc = iterator.docID(); doc < max; ) {
+if ((acceptDocs == null || acceptDocs.get(doc)) && twoPhase.matches()) 
{
+  collector.collect(doc);
+}
+
+doc = iterator.nextDoc();

Review Comment:
   Good catch, this is because this was copied from a more complex version of 
this code (the one that intersects with a collector's competitive iterator) 
that used to call `advance` on the iterator and then `continue`, so nextDoc() 
could not be part of the for-loop control statement or the iterator would 
nextDoc() right after advance(). But this one doesn't need to advance, so it 
can.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use DenseConjunctionBulkScorer for single queries sometimes. [lucene]

2025-02-27 Thread via GitHub


jpountz commented on PR #14293:
URL: https://github.com/apache/lucene/pull/14293#issuecomment-2687300793

   Yes, especially with queries that match long ranges of doc IDs by design, 
such as those that take advantage of sparse indexing.
   
   > For reminding, I think we also need a CHANGES entry :)
   
   Good point, added. :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Evaluate bumping the minimum compile Java version [lucene]

2025-02-27 Thread via GitHub


ChrisHegarty commented on issue #14229:
URL: https://github.com/apache/lucene/issues/14229#issuecomment-2687327575

   I opened [#14302](https://github.com/apache/lucene/pull/14302) for the 
initial bump to Java 23.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Bump minimum required Java version to 23 [lucene]

2025-02-27 Thread via GitHub


ChrisHegarty opened a new pull request, #14302:
URL: https://github.com/apache/lucene/pull/14302

   This commit bumps minimum required Java version to 23.
   
   The _main_ branch is accumulating changes for the next major release, Lucene 
11.0.0. The intent is to release Lucene 11.0.0 with a minimum of Java 25. We'll 
continually bump the minimum Java release in the _main_ branch, as and when 
newer Java versions are become available, until we hit Java 25. We do this 
intentionally so that Lucene developers can take advantage of newer Java 
features sooner.
   
   This change is only intended for the _main_ branch, and will not be 
backported.
   
   relates #14229


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Bump minimum required Java version to 23 [lucene]

2025-02-27 Thread via GitHub


ChrisHegarty commented on PR #14302:
URL: https://github.com/apache/lucene/pull/14302#issuecomment-2687333833

   > ```
   > > Task :lucene:core.tests:ecjLintMain FAILED
   > source level should be in '1.1'...'1.8','9'...'21' (or '5.0'..'21.0'): 23
   > ```
   > 
   > ECJ is failing, perhaps it needs to be updated as well.
   
   ++ yeap! I'll take a look for a newer version of ECJ. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Bump minimum required Java version to 23 [lucene]

2025-02-27 Thread via GitHub


dweiss commented on PR #14302:
URL: https://github.com/apache/lucene/pull/14302#issuecomment-2687329916

   ```
   > Task :lucene:core.tests:ecjLintMain FAILED
   source level should be in '1.1'...'1.8','9'...'21' (or '5.0'..'21.0'): 23
   ```
   
   ECJ is failing, perhaps it needs to be updated as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Bump minimum required Java version to 23 [lucene]

2025-02-27 Thread via GitHub


dweiss commented on PR #14302:
URL: https://github.com/apache/lucene/pull/14302#issuecomment-2687333543

   All the way up to 3.40.0 in versions.toml:
   ```
   ecj = "3.36.0"
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Address completion fields testing gap and truly allow loading FST off heap [lucene]

2025-02-27 Thread via GitHub


javanna commented on PR #14270:
URL: https://github.com/apache/lucene/pull/14270#issuecomment-2687334196

   That's good with me. Shall we clearly document this then as a follow-up, and 
shall I make the fst load mode static field package private perhaps, so that we 
at least test both modes?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Reciprocal Rank Fusion (RRF) in TopDocs [lucene]

2025-02-27 Thread via GitHub


javanna commented on code in PR #13470:
URL: https://github.com/apache/lucene/pull/13470#discussion_r1973180965


##
lucene/core/src/java/org/apache/lucene/search/TopDocs.java:
##
@@ -350,4 +354,89 @@ private static TopDocs mergeAux(
   return new TopFieldDocs(totalHits, hits, sort.getSort());
 }
   }
+
+  private record ShardIndexAndDoc(int shardIndex, int doc) {}
+
+  /**
+   * Reciprocal Rank Fusion method.
+   *
+   * This method combines different search results into a single ranked 
list by combining their
+   * ranks. This is especially well suited when combining hits computed via 
different methods, whose
+   * score distributions are hardly comparable.
+   *
+   * @param topN the top N results to be returned
+   * @param k a constant determines how much influence documents in individual 
rankings have on the
+   * final result. A higher value gives lower rank documents more 
influence. k should be greater
+   * than or equal to 1.
+   * @param hits a list of TopDocs to apply RRF on
+   * @return a TopDocs contains the top N ranked results.
+   */
+  public static TopDocs rrf(int topN, int k, TopDocs[] hits) {
+if (topN < 1) {
+  throw new IllegalArgumentException("topN must be >= 1, got " + topN);
+}
+if (k < 1) {
+  throw new IllegalArgumentException("k must be >= 1, got " + k);
+}
+
+boolean shardIndexSet = false;
+outer:
+for (TopDocs topDocs : hits) {
+  for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
+shardIndexSet = scoreDoc.shardIndex != -1;
+break outer;

Review Comment:
   yes, thank you!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Bump minimum required Java version to 23 [lucene]

2025-02-27 Thread via GitHub


ChrisHegarty merged PR #14302:
URL: https://github.com/apache/lucene/pull/14302


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] ExceptionInInitializerError in ScorerUtil [lucene]

2025-02-27 Thread via GitHub


uschindler commented on PR #14280:
URL: https://github.com/apache/lucene/pull/14280#issuecomment-2687564614

   I get crazy when I see this code (before and after). 😜
   
   In general I would rewrite that in a different way to not use a test index 
to initialize the code. Can't we figure out what the class is in a better way, 
so the initializer does not need to do interruptible things?
   
   About the initialization and the code: There are multiple issues like 
missing synchronization. It may not be an issue but it's unfortunately an 
anti-pattern. The problem is also that the optimizer cannot assume that the 
returned value is constant.
   
   Finally: Lucene explicitly says that interrupting a search thread is not 
supported and may cause other havoc. 
   
   -1 to merge this. Better rewrite the code not not fo crazy stuff in an 
static initialization block.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Bump Lucene 11.0.0 minimum required Java version to 25 [lucene]

2025-02-27 Thread via GitHub


ChrisHegarty commented on issue #14229:
URL: https://github.com/apache/lucene/issues/14229#issuecomment-2687571909

   ... we're on the train now. I repurposed this issue to track the upgrade to 
Java 25.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] ExceptionInInitializerError in ScorerUtil [lucene]

2025-02-27 Thread via GitHub


uschindler commented on PR #14280:
URL: https://github.com/apache/lucene/pull/14280#issuecomment-2687575659

   Generally the issue here could be solved using this JEP which is long 
awaited: https://openjdk.org/jeps/8209964


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Create vectorized versions of ScalarQuantizer.quantize and recalculateCorrectiveOffset [lucene]

2025-02-27 Thread via GitHub


thecoop opened a new pull request, #14304:
URL: https://github.com/apache/lucene/pull/14304

   This resolves #13922
   
   JMH shows a ~5x speedup:
   ```
   Benchmark  Mode  Cnt ScoreError   Units
   Quant.quantizethrpt5   231.147 ± 13.401  ops/ms
   Quant.quantizeVector  thrpt5  1234.446 ± 49.961  ops/ms
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Support load per-iteration replacement of NamedSPI [lucene]

2025-02-27 Thread via GitHub


ChrisHegarty commented on PR #14275:
URL: https://github.com/apache/lucene/pull/14275#issuecomment-2687787710

   @jpountz given the connection of this PR with completion FST, do you have 
opinions here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] develocity build scans fail to upload sometimes [lucene]

2025-02-27 Thread via GitHub


dweiss commented on issue #14305:
URL: https://github.com/apache/lucene/issues/14305#issuecomment-2687815179

   Example:
   
   
![Image](https://github.com/user-attachments/assets/f00d68af-c576-4872-b861-e1fd526e2d15)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[I] develocity build scans fail to upload sometimes [lucene]

2025-02-27 Thread via GitHub


dweiss opened a new issue, #14305:
URL: https://github.com/apache/lucene/issues/14305

   ### Description
   
   Seems like they're expiring before the build is completed. Relevant links:
   https://issues.apache.org/jira/browse/INFRA-26057
   
https://github.com/gradle/actions/blob/main/docs/setup-gradle.md#managing-develocity-access-keys
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] TestScorerUtil.testLikelyImpactsEnum fails [lucene]

2025-02-27 Thread via GitHub


jpountz commented on issue #14303:
URL: https://github.com/apache/lucene/issues/14303#issuecomment-2688040417

   It looks like this is due to codec randomization getting applied before the 
static block runs. I opened #14306.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Make BlockPostingsEnum public. [lucene]

2025-02-27 Thread via GitHub


jpountz opened a new pull request, #14306:
URL: https://github.com/apache/lucene/pull/14306

   This allows access from `ScorerUtil` so that it no longer needs a static 
block that creates an index to be able to introspect what implementation is 
used for impacts.
   
   Closes #14303


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] ExceptionInInitializerError in ScorerUtil [lucene]

2025-02-27 Thread via GitHub


jpountz commented on PR #14280:
URL: https://github.com/apache/lucene/pull/14280#issuecomment-2688043954

   It looks like another problem with the current code is that codec 
randomization may randomly run before this block of code, making it unreliable 
when running tests. I opened #14306 as an alternative.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org