Re: [PR] Reciprocal Rank Fusion (RRF) in TopDocs [lucene]
jpountz merged PR #13470: URL: https://github.com/apache/lucene/pull/13470 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Lack of coverage of DenseConjunctionBulkScorer with min competitive scores and competitive iterators [lucene]
jpountz closed issue #14283: Lack of coverage of DenseConjunctionBulkScorer with min competitive scores and competitive iterators URL: https://github.com/apache/lucene/issues/14283 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Remove scoreAll() optimization from DefaultBulkScorer. [lucene]
jpountz merged PR #14039: URL: https://github.com/apache/lucene/pull/14039 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Bump floor segment size to 16MB. [lucene]
jpountz merged PR #14189: URL: https://github.com/apache/lucene/pull/14189 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Fix TestSysoutLimits by making nested test classes not extend LuceneTestCase [lucene]
madrob commented on code in PR #14309: URL: https://github.com/apache/lucene/pull/14309#discussion_r1974288498 ## lucene/test-framework/src/java/org/apache/lucene/tests/util/TestRuleLimitSysouts.java: ## @@ -207,6 +207,7 @@ protected void before() throws Throwable { checkCaptureStreams(); } resetCaptureState(); +var bef = bytesWritten.get(); Review Comment: did this sneak in? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Should we auto-adjust top score doc and top field collector manager based on slices? [lucene]
javanna closed issue #13791: Should we auto-adjust top score doc and top field collector manager based on slices? URL: https://github.com/apache/lucene/issues/13791 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Enhance DictionaryCompoundWordTokenFilter [lucene]
renatoh commented on PR #14278: URL: https://github.com/apache/lucene/pull/14278#issuecomment-2688748464 > Thanks @renatoh ! thanks for your inputs and review it! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] TestSysoutLimits still occasionally failing [lucene]
dweiss commented on issue #14307: URL: https://github.com/apache/lucene/issues/14307#issuecomment-2688934137 This one is caused by a more complex interaction - LuceneTestCase tries to set up a random TimeZone and this prints a warning like this: ``` WARNING: Use of the three-letter time zone ID "AET" is deprecated and it will be removed in a future release ``` I think it'll actually be more beneficial to make the nested tests in TestRuleLimitSysouts not extend LuceneTestCase so that there are no randomized warnings being printed to std streams. I'll try to provide a patch tomorrow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Use DenseConjunctionBulkScorer for single queries sometimes. [lucene]
jpountz merged PR #14293: URL: https://github.com/apache/lucene/pull/14293 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] introduce new parameter onlyLongestMatchNoSubwords replacing onlyLongestMatch [lucene]
renatoh opened a new pull request, #14311: URL: https://github.com/apache/lucene/pull/14311 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] introduce new parameter onlyLongestMatchNoSubwords replacing onlyLongestMatch [lucene]
renatoh commented on PR #14311: URL: https://github.com/apache/lucene/pull/14311#issuecomment-2689203083 @rmuir onlyLongestMatchNoSubwords is basically what was onlyLongestMatch=true + reuseChars=false -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] ExceptionInInitializerError in ScorerUtil [lucene]
jpountz commented on PR #14280: URL: https://github.com/apache/lucene/pull/14280#issuecomment-2688597916 This sounded like a good idea so I applied it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Fix TestSysoutLimits by making nested test classes not extend LuceneTestCase [lucene]
dweiss commented on code in PR #14309: URL: https://github.com/apache/lucene/pull/14309#discussion_r1974964208 ## lucene/test-framework/src/java/org/apache/lucene/tests/util/TestRuleLimitSysouts.java: ## @@ -207,6 +207,7 @@ protected void before() throws Throwable { checkCaptureStreams(); } resetCaptureState(); +var bef = bytesWritten.get(); Review Comment: Ugh. Absolutely - debugging artifact. Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[I] MultiTermQueryConstantScoreBlendedWrapper#createWeight#rewriteInner performance optimization ideas [lucene]
hanbj opened a new issue, #14313: URL: https://github.com/apache/lucene/issues/14313 ### Description There are many implementations of MultiTermQuery, such as TermInSetQuery FuzzyQuery、WildcardQuery、PrefixQuery、TermRangeQuery、RegexpQuery、TermsQuery、AutomatonQuery Wait, so optimizing the performance of MultiTermQuery has corresponding performance improvements for various queries. `The default logic is as follows:` 1. When the number of terms is less than or equal to 16, rewrite it as BooleanQuery 2. When the number of terms is greater than 16, traverse the posting list corresponding to each term to collect document numbers > 2.1. If the document frequency corresponding to this term is less than or equal to 512, record the document ID in otherTerms > 2.2. If the document frequency corresponding to the term is greater than 512, add the posting list corresponding to the term to the priority queue highFrequent Terms 3. Encapsulate the 16 posting lists contained in otherTerms and highFrequency Terms into the set subs 4. Use the Disjunction DISIApproximation wrapper to jointly participate in the collection of document numbers during the merging of posting lists `The optimization idea is as follows:` 1. Traverse the posting list corresponding to each term and delay processing, so that it can be returned in advance when encountering the following situations > 1.1. A term matches all documents > 1.2. A term matches all documents contained in that field 2. The frequency of documents corresponding to a certain term is very high, less than or equal to reader. maxDoc() -4096. When encountering a large posting list, reverse collection can be performed. At this time, the posting lists corresponding to other terms can be traversed, and the corresponding document IDs can be deleted from the reverse collected set. If the reverse collected set is empty, it means that all documents are matched and can be returned in advance. If it is not empty, the document IDs contained in the reverse collection set are also relatively small, and the performance will be fast when merging the reverse linked list later 3. When the term iteration is completed and it is found that the number of terms is equal to the number of terms contained in the field, all documents are included, and there is no need to traverse the posting list of each term. I have already implemented this optimization myself and it has been about half a year since it was launched in the production environment. Currently, I have not found any customer feedback issues, but the code changes are slightly significant. Is the Lucene community interested? If so, I will submit a PR. The test results are as follows: --- type | Performance improvement --- A term match all docs | 80 times --- A term matches all documents containing that field |70 times --- contains all terms |80 times --- Reverse collection |8 times --- -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Expose the ImpactsEnum impl in Lucene101PostingsFormat. [lucene]
uschindler commented on code in PR #14306: URL: https://github.com/apache/lucene/pull/14306#discussion_r1974361308 ## lucene/core/src/java/org/apache/lucene/codecs/lucene101/Lucene101PostingsFormat.java: ## @@ -351,6 +352,14 @@ public final class Lucene101PostingsFormat extends PostingsFormat { public static final int LEVEL1_MASK = LEVEL1_NUM_DOCS - 1; + /** + * Return the class that implements {@link ImpactsEnum} in this {@link PostingsFormat}. This is + * internally used to help the JVM make good inlining decisions. + */ + public static Class getImpactsEnumImpl() { Review Comment: maybe add `@lucene.internal` as javadocs tag. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Bump minimum required Java version to 23 [lucene]
uschindler commented on PR #14302: URL: https://github.com/apache/lucene/pull/14302#issuecomment-2689162858 I stopped the Jenkins builds on Policeman Jenkins and will check to update the config for Java 23. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] fix lambdas for java 23 [lucene]
rmuir commented on PR #14308: URL: https://github.com/apache/lucene/pull/14308#issuecomment-2689170929 Thanks, the change is just a bit annoying from noise perspective. I will merge up main first, to make sure there aren't any new lambdas in recent commits that anger the check. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Bump minimum required Java version to 23 [lucene]
uschindler commented on PR #14302: URL: https://github.com/apache/lucene/pull/14302#issuecomment-2689172018 I will look into moving the MMapDirectory parts to the main code and only leave the vector stuff in the APIJAR special case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Bump minimum required Java version to 23 [lucene]
uschindler commented on PR #14302: URL: https://github.com/apache/lucene/pull/14302#issuecomment-2689174826 We may now also remove SecurityManager and AccessController everywhere in main branch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Bump minimum required Java version to 23 [lucene]
madrob commented on code in PR #14302: URL: https://github.com/apache/lucene/pull/14302#discussion_r1974374898 ## lucene/CHANGES.txt: ## @@ -45,7 +45,8 @@ Bug Fixes Other - -(No changes) + +* GITHUB#9: Bump minimum required Java version to 23 Review Comment: This needed to be updated post issue creation -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Enhance DictionaryCompoundWordTokenFilter [lucene]
rmuir commented on PR #14278: URL: https://github.com/apache/lucene/pull/14278#issuecomment-2688769041 @renatoh Feel free to open another PR, if you have time, to try to improve defaults around this for the next version of lucene. If i ask for "longest match" I don't expect to have additional "shorter" subwords coming out of the analyzer, so it seems like a good improvement. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Lack of coverage of DenseConjunctionBulkScorer with min competitive scores and competitive iterators [lucene]
jpountz closed issue #14283: Lack of coverage of DenseConjunctionBulkScorer with min competitive scores and competitive iterators URL: https://github.com/apache/lucene/issues/14283 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Make Lucene better at skipping long runs of matches. [lucene]
jpountz opened a new pull request, #14312: URL: https://github.com/apache/lucene/pull/14312 This is an attempt to resurrect #12194 in a (hopefully) better way. Now that many queries run with `DenseConjunctionBulkScorer`, which scores windows of doc IDs at a time, it becomes natural to skip clauses that have long runs of matches by checking if they match the whole window. This introduces the same `DocIdSetIterator#peekNextNonMatchingDocID()` API that PR #12194 suggested, implements it in `DocIdSetIterator#all`, and uses it in `DenseConjunctionBulkScorer` to skip clauses that match the whole window. For better test coverage, `DenseConjunctionBulkScorer` was refactored to require at least one iterator, which can be a `DocIdSetIterator#all` instance if all docs match. In follow-ups, we should look into supporting other queries that are likely to have long runs of matches, in particular doc-value range queries on fields that are part of the index sort and take advantage of a doc-value skipper. Closes #11915 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Recommend multi-stage retrieval pipelines in oal.search javadocs. [lucene]
jpountz merged PR #14310: URL: https://github.com/apache/lucene/pull/14310 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Make Lucene better at skipping long runs of matches. [lucene]
jpountz commented on PR #14312: URL: https://github.com/apache/lucene/pull/14312#issuecomment-2689215958 cc @gf2121 who's been reviewing related PRs recently and @iverase for the connection with sparse indexing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] fix lambdas for java 23 [lucene]
rmuir opened a new pull request, #14308: URL: https://github.com/apache/lucene/pull/14308 After the upgrade to java 23, my editor is flooded with warnings of unused variables from lambdas. Fix them. I also downloaded eclipse, installed it, checked all possible compiler options, compared the resulting file and ensured out eclipse compiler file is synced up, so that everything is explicit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Enhance DictionaryCompoundWordTokenFilter [lucene]
renatoh commented on PR #14278: URL: https://github.com/apache/lucene/pull/14278#issuecomment-2688845145 @rmuir could we reduce it to only two 'valid' behavior: onlyLongestMatch=true with reuseChars=false and onlyLongestMatch=false with reuseChars=true. if we think only these two cases make sense, we could actually reduce it to one flag and try to come up with a different name for that only flag. or should onlyLongestMatch=true with reuseChars=true, the default behavior of today, also be an option? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Expose the ImpactsEnum impl in Lucene101PostingsFormat. [lucene]
jpountz merged PR #14306: URL: https://github.com/apache/lucene/pull/14306 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] TestScorerUtil.testLikelyImpactsEnum fails [lucene]
jpountz closed issue #14303: TestScorerUtil.testLikelyImpactsEnum fails URL: https://github.com/apache/lucene/issues/14303 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Expose the ImpactsEnum impl in Lucene101PostingsFormat. [lucene]
jpountz commented on PR #14306: URL: https://github.com/apache/lucene/pull/14306#issuecomment-2689030846 I went ahead and merged to step the stream of failures. Happy to revisit the approach in a follow-up if there are concerns. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Bump minimum required Java version to 23 [lucene]
reta commented on code in PR #14302: URL: https://github.com/apache/lucene/pull/14302#discussion_r1974531053 ## lucene/core/src/java/org/apache/lucene/index/StandardDirectoryReader.java: ## @@ -476,7 +476,7 @@ protected void doClose() throws IOException { } } }; -try (Closeable finalizer = decRefDeleter) { +try (var _ = decRefDeleter) { Review Comment: I think `_` could be completely omitted here: ``` try (decRefDeleter) { ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Bump minimum required Java version to 23 [lucene]
reta commented on code in PR #14302: URL: https://github.com/apache/lucene/pull/14302#discussion_r1974531053 ## lucene/core/src/java/org/apache/lucene/index/StandardDirectoryReader.java: ## @@ -476,7 +476,7 @@ protected void doClose() throws IOException { } } }; -try (Closeable finalizer = decRefDeleter) { +try (var _ = decRefDeleter) { Review Comment: I think `_` could be completely omitted here: ``` try (decRefDeleter) { ``` sorry late for review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Create vectorized versions of ScalarQuantizer.quantize and recalculateCorrectiveOffset [lucene]
jpountz commented on PR #14304: URL: https://github.com/apache/lucene/pull/14304#issuecomment-2689117193 Have you been able to run `luceneutil` to get a sense of the indexing and search speedups? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Reciprocal Rank Fusion (RRF) in TopDocs [lucene]
jpountz commented on PR #13470: URL: https://github.com/apache/lucene/pull/13470#issuecomment-2689114469 > I have a bias for the latter, as I was planning on improving the docs of the oal.search package as a follow-up to provide guidance wrt how to do hybrid search by linking to this RRF helper. I opened #14310. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] ExceptionInInitializerError in ScorerUtil [lucene]
jpountz commented on PR #14280: URL: https://github.com/apache/lucene/pull/14280#issuecomment-2689123306 I merged my other PR, which should supersede this one. Closing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Recommend multi-stage retrieval pipelines in oal.search javadocs. [lucene]
jpountz commented on code in PR #14310: URL: https://github.com/apache/lucene/pull/14310#discussion_r1974344510 ## lucene/core/src/java/org/apache/lucene/search/package-info.java: ## @@ -350,6 +350,40 @@ * * * + * Multi-stage retrieval pipelines + * + * The above explains how to influence the score when evaluating all matches of the query. This + * is expensive by design since it applies to all matches of the query, which could be millions. In + * order to apply more sophisticated ranking logic, a good approach consists of having a retrieval + * pipeline that runs a simple candidate retrieval stage that retrieves e.g. 1,000 hits, followed by + * a more sophisticated reranking stage that reranks these 1,000 hits to select the best 100 hits + * among them. Since the number of hits that this retrieval stage needs to operate on is bounded, it + * allows it to be more sophisticated. + * + * Lucene exposes reranking via the {@link org.apache.lucene.search.Rescorer} abstract class, + * which has two main sub-classes: + * + * + * {@link org.apache.lucene.search.QueryRescorer}, to rescore using a query. For instance, the + * query string could be parsed as phrase query using {@link + * org.apache.lucene.util.QueryBuilder#createPhraseQuery} instead of a boolean query in order + * to help boost hits which also match the query string as a phrase. + * {@link org.apache.lucene.search.SortRescorer}, to rescore using a {@link + * org.apache.lucene.search.Sort}. For instance, the best 1,000 hits by BM25 score may be + * sorted by descending popularity in order to compute the final top-100 hits. + * + * + * Top hits fusion + * + * Sometimes, multiple retrieval pipelines may make sense, having their own pros and cons. A + * typical example would be a lexical retrieval pipeline, matching exactly what the user requested, + * and a semantic retrieval pipeline, matching documents that are closest to the user's query from a + * semantic perspective. Combining scores is hazardous as different retrieval pipelines often + * produce scores that not only have different ranges, but also different distributions within this + * range. A robust way of combining multiple retrieval pipelines consists of combining the top hits + * that they produce through their ranks rather thank through their scores using reciprocal rank Review Comment: Woops, thanks for catching. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Recommend multi-stage retrieval pipelines in oal.search javadocs. [lucene]
rmuir commented on code in PR #14310: URL: https://github.com/apache/lucene/pull/14310#discussion_r1974340780 ## lucene/core/src/java/org/apache/lucene/search/package-info.java: ## @@ -350,6 +350,40 @@ * * * + * Multi-stage retrieval pipelines + * + * The above explains how to influence the score when evaluating all matches of the query. This + * is expensive by design since it applies to all matches of the query, which could be millions. In + * order to apply more sophisticated ranking logic, a good approach consists of having a retrieval + * pipeline that runs a simple candidate retrieval stage that retrieves e.g. 1,000 hits, followed by + * a more sophisticated reranking stage that reranks these 1,000 hits to select the best 100 hits + * among them. Since the number of hits that this retrieval stage needs to operate on is bounded, it + * allows it to be more sophisticated. + * + * Lucene exposes reranking via the {@link org.apache.lucene.search.Rescorer} abstract class, + * which has two main sub-classes: + * + * + * {@link org.apache.lucene.search.QueryRescorer}, to rescore using a query. For instance, the + * query string could be parsed as phrase query using {@link + * org.apache.lucene.util.QueryBuilder#createPhraseQuery} instead of a boolean query in order + * to help boost hits which also match the query string as a phrase. + * {@link org.apache.lucene.search.SortRescorer}, to rescore using a {@link + * org.apache.lucene.search.Sort}. For instance, the best 1,000 hits by BM25 score may be + * sorted by descending popularity in order to compute the final top-100 hits. + * + * + * Top hits fusion + * + * Sometimes, multiple retrieval pipelines may make sense, having their own pros and cons. A + * typical example would be a lexical retrieval pipeline, matching exactly what the user requested, + * and a semantic retrieval pipeline, matching documents that are closest to the user's query from a + * semantic perspective. Combining scores is hazardous as different retrieval pipelines often + * produce scores that not only have different ranges, but also different distributions within this + * range. A robust way of combining multiple retrieval pipelines consists of combining the top hits + * that they produce through their ranks rather thank through their scores using reciprocal rank Review Comment: ```suggestion * that they produce through their ranks rather than through their scores using reciprocal rank ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Recommend multi-stage retrieval pipelines in oal.search javadocs. [lucene]
rmuir commented on code in PR #14310: URL: https://github.com/apache/lucene/pull/14310#discussion_r1974350732 ## lucene/core/src/java/org/apache/lucene/search/package-info.java: ## @@ -350,6 +350,40 @@ * * * + * Multi-stage retrieval pipelines + * + * The above explains how to influence the score when evaluating all matches of the query. This + * is expensive by design since it applies to all matches of the query, which could be millions. In + * order to apply more sophisticated ranking logic, a good approach consists of having a retrieval + * pipeline that runs a simple candidate retrieval stage that retrieves e.g. 1,000 hits, followed by + * a more sophisticated reranking stage that reranks these 1,000 hits to select the best 100 hits + * among them. Since the number of hits that this retrieval stage needs to operate on is bounded, it + * allows it to be more sophisticated. + * + * Lucene exposes reranking via the {@link org.apache.lucene.search.Rescorer} abstract class, + * which has two main sub-classes: + * + * + * {@link org.apache.lucene.search.QueryRescorer}, to rescore using a query. For instance, the + * query string could be parsed as phrase query using {@link + * org.apache.lucene.util.QueryBuilder#createPhraseQuery} instead of a boolean query in order + * to help boost hits which also match the query string as a phrase. + * {@link org.apache.lucene.search.SortRescorer}, to rescore using a {@link + * org.apache.lucene.search.Sort}. For instance, the best 1,000 hits by BM25 score may be + * sorted by descending popularity in order to compute the final top-100 hits. + * + * + * Top hits fusion + * + * Sometimes, multiple retrieval pipelines may make sense, having their own pros and cons. A + * typical example would be a lexical retrieval pipeline, matching exactly what the user requested, + * and a semantic retrieval pipeline, matching documents that are closest to the user's query from a + * semantic perspective. Combining scores is hazardous as different retrieval pipelines often + * produce scores that not only have different ranges, but also different distributions within this + * range. A robust way of combining multiple retrieval pipelines consists of combining the top hits + * that they produce through their ranks rather than through their scores using reciprocal rank + * fusion. This is exposed via {@link org.apache.lucene.search.TopDocs#rrf(int, int, TopDocs[])}. Review Comment: ```suggestion * fusion. This is exposed via {@link org.apache.lucene.search.TopDocs#rrf(int topN, int k, TopDocs[] hits)}. ``` it is at least legal to do this, and can significantly include readability, since types aren't always enough to understand the parameters when reading. As far as the auto-generated label that `javadoc` tool makes from it, you'd have to test it out. of course that can always be specified, but maybe this is an easier approach. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] ExceptionInInitializerError in ScorerUtil [lucene]
jpountz closed pull request #14280: ExceptionInInitializerError in ScorerUtil URL: https://github.com/apache/lucene/pull/14280 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Improve documentation for org.apache.lucene.search Sort class [lucene]
jpountz commented on issue #14295: URL: https://github.com/apache/lucene/issues/14295#issuecomment-2689221399 FWIW I recently updated this page with this new link https://github.com/apache/lucene/pull/14251/files#diff-0a8bc8e8ffb40f26815f92ad02188c457ddd7594c4ac06208e8f5376ffed3cfbR213. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] fix lambdas for java 23 [lucene]
rmuir merged PR #14308: URL: https://github.com/apache/lucene/pull/14308 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Recommend multi-stage retrieval pipelines in oal.search javadocs. [lucene]
jpountz commented on code in PR #14310: URL: https://github.com/apache/lucene/pull/14310#discussion_r1974354077 ## lucene/core/src/java/org/apache/lucene/search/package-info.java: ## @@ -350,6 +350,40 @@ * * * + * Multi-stage retrieval pipelines + * + * The above explains how to influence the score when evaluating all matches of the query. This + * is expensive by design since it applies to all matches of the query, which could be millions. In + * order to apply more sophisticated ranking logic, a good approach consists of having a retrieval + * pipeline that runs a simple candidate retrieval stage that retrieves e.g. 1,000 hits, followed by + * a more sophisticated reranking stage that reranks these 1,000 hits to select the best 100 hits + * among them. Since the number of hits that this retrieval stage needs to operate on is bounded, it + * allows it to be more sophisticated. + * + * Lucene exposes reranking via the {@link org.apache.lucene.search.Rescorer} abstract class, + * which has two main sub-classes: + * + * + * {@link org.apache.lucene.search.QueryRescorer}, to rescore using a query. For instance, the + * query string could be parsed as phrase query using {@link + * org.apache.lucene.util.QueryBuilder#createPhraseQuery} instead of a boolean query in order + * to help boost hits which also match the query string as a phrase. + * {@link org.apache.lucene.search.SortRescorer}, to rescore using a {@link + * org.apache.lucene.search.Sort}. For instance, the best 1,000 hits by BM25 score may be + * sorted by descending popularity in order to compute the final top-100 hits. + * + * + * Top hits fusion + * + * Sometimes, multiple retrieval pipelines may make sense, having their own pros and cons. A + * typical example would be a lexical retrieval pipeline, matching exactly what the user requested, + * and a semantic retrieval pipeline, matching documents that are closest to the user's query from a + * semantic perspective. Combining scores is hazardous as different retrieval pipelines often + * produce scores that not only have different ranges, but also different distributions within this + * range. A robust way of combining multiple retrieval pipelines consists of combining the top hits + * that they produce through their ranks rather than through their scores using reciprocal rank + * fusion. This is exposed via {@link org.apache.lucene.search.TopDocs#rrf(int, int, TopDocs[])}. Review Comment: Ah, thanks, I didn't even know that one could do this. I just checked the generated javadocs, they look good with these parameter names. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Bump minimum required Java version to 23 [lucene]
dweiss commented on code in PR #14302: URL: https://github.com/apache/lucene/pull/14302#discussion_r1973185969 ## lucene/core/src/java/org/apache/lucene/index/IndexReader.java: ## @@ -253,8 +253,10 @@ public final void decRef() throws IOException { final int rc = refCount.decrementAndGet(); if (rc == 0) { closed = true; - try (Closeable finalizer = this::reportCloseToParentReaders; - Closeable finalizer1 = this::notifyReaderClosedListeners) { + try (@SuppressWarnings("unused") + Closeable finalizer = this::reportCloseToParentReaders; + @SuppressWarnings("unused") + Closeable finalizer1 = this::notifyReaderClosedListeners) { Review Comment: Eh. This seems like an issue with ECJ since it's clearly not an unused thing if it's in a try-with-resources... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Avoid unnecessary evaluations and skipping documents [lucene]
hanbj commented on PR #14301: URL: https://github.com/apache/lucene/pull/14301#issuecomment-2687359672 Thank you for the review. Changes have been added. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Bump minimum required Java version to 23 [lucene]
dweiss commented on code in PR #14302: URL: https://github.com/apache/lucene/pull/14302#discussion_r1973191532 ## lucene/core/src/java/org/apache/lucene/index/IndexReader.java: ## @@ -253,8 +253,10 @@ public final void decRef() throws IOException { final int rc = refCount.decrementAndGet(); if (rc == 0) { closed = true; - try (Closeable finalizer = this::reportCloseToParentReaders; - Closeable finalizer1 = this::notifyReaderClosedListeners) { + try (@SuppressWarnings("unused") + Closeable finalizer = this::reportCloseToParentReaders; + @SuppressWarnings("unused") + Closeable finalizer1 = this::notifyReaderClosedListeners) { Review Comment: Relevant discussion here, for example. https://bugs.eclipse.org/bugs/show_bug.cgi?id=560733 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Bump minimum required Java version to 23 [lucene]
rmuir commented on code in PR #14302: URL: https://github.com/apache/lucene/pull/14302#discussion_r1973201491 ## lucene/core/src/java/org/apache/lucene/index/IndexReader.java: ## @@ -253,8 +253,10 @@ public final void decRef() throws IOException { final int rc = refCount.decrementAndGet(); if (rc == 0) { closed = true; - try (Closeable finalizer = this::reportCloseToParentReaders; - Closeable finalizer1 = this::notifyReaderClosedListeners) { + try (@SuppressWarnings("unused") + Closeable finalizer = this::reportCloseToParentReaders; + @SuppressWarnings("unused") + Closeable finalizer1 = this::notifyReaderClosedListeners) { Review Comment: I've got warnings for unused variables of lambdas that are new, showing up from eclipse language server with suggestion to "rename to unnamed variable". These look legit, but are new. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Bump minimum required Java version to 23 [lucene]
ChrisHegarty commented on code in PR #14302: URL: https://github.com/apache/lucene/pull/14302#discussion_r1973205040 ## lucene/core/src/java/org/apache/lucene/index/IndexReader.java: ## @@ -253,8 +253,10 @@ public final void decRef() throws IOException { final int rc = refCount.decrementAndGet(); if (rc == 0) { closed = true; - try (Closeable finalizer = this::reportCloseToParentReaders; - Closeable finalizer1 = this::notifyReaderClosedListeners) { + try (@SuppressWarnings("unused") + Closeable finalizer = this::reportCloseToParentReaders; + @SuppressWarnings("unused") + Closeable finalizer1 = this::notifyReaderClosedListeners) { Review Comment: Now we're on >22 we can just used unnamed variables, which has the nice property to keep the checker happy! [ad54196](https://github.com/apache/lucene/pull/14302/commits/ad54196bd2dd56bbe9b30ff2f0db94258c5ef02a) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Bump minimum required Java version to 23 [lucene]
rmuir commented on code in PR #14302: URL: https://github.com/apache/lucene/pull/14302#discussion_r1973206235 ## lucene/core/src/java/org/apache/lucene/index/IndexReader.java: ## @@ -253,8 +253,10 @@ public final void decRef() throws IOException { final int rc = refCount.decrementAndGet(); if (rc == 0) { closed = true; - try (Closeable finalizer = this::reportCloseToParentReaders; - Closeable finalizer1 = this::notifyReaderClosedListeners) { + try (@SuppressWarnings("unused") + Closeable finalizer = this::reportCloseToParentReaders; + @SuppressWarnings("unused") + Closeable finalizer1 = this::notifyReaderClosedListeners) { Review Comment: Example of what i'm seeing. I can take care in a followup PR if it does not fail build for now. ``` diff --git a/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java b/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java index a8eca64c962..0f7693eedd1 100644 --- a/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java +++ b/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java @@ -,7 +,7 @@ public class IndexWriter infoStream.message("IW", "now abort pending addIndexes merge"); } merge.setAborted(); - merge.close(false, false, mr -> {}); + merge.close(false, false, _ -> {}); onMergeFinished(merge); }); pendingAddIndexesMerges.clear(); @@ -3350,7 +3350,7 @@ public class IndexWriter handleMergeException(t, merge); } finally { synchronized (IndexWriter.this) { - merge.close(success, false, mr -> {}); + merge.close(success, false, _ -> {}); onMergeFinished(merge); } } @@ -3731,7 +3731,7 @@ public class IndexWriter // necessary files to disk and checkpointed them. pointInTimeMerges = preparePointInTimeMerge( -toCommit, stopAddingMergedSegments::get, MergeTrigger.COMMIT, sci -> {}); +toCommit, stopAddingMergedSegments::get, MergeTrigger.COMMIT, _ -> {}); } } success = true; ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Bump minimum required Java version to 23 [lucene]
rmuir commented on code in PR #14302: URL: https://github.com/apache/lucene/pull/14302#discussion_r1973219365 ## lucene/core/src/java/org/apache/lucene/index/IndexReader.java: ## @@ -253,8 +253,10 @@ public final void decRef() throws IOException { final int rc = refCount.decrementAndGet(); if (rc == 0) { closed = true; - try (Closeable finalizer = this::reportCloseToParentReaders; - Closeable finalizer1 = this::notifyReaderClosedListeners) { + try (@SuppressWarnings("unused") + Closeable finalizer = this::reportCloseToParentReaders; + @SuppressWarnings("unused") + Closeable finalizer1 = this::notifyReaderClosedListeners) { Review Comment: Maybe it never warned before, because it was impossible to fix with java 21? my version of eclipse.jdt.ls is unchanged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Bump minimum required Java version to 23 [lucene]
ChrisHegarty commented on code in PR #14302: URL: https://github.com/apache/lucene/pull/14302#discussion_r1973227461 ## lucene/core/src/java/org/apache/lucene/index/IndexReader.java: ## @@ -253,8 +253,10 @@ public final void decRef() throws IOException { final int rc = refCount.decrementAndGet(); if (rc == 0) { closed = true; - try (Closeable finalizer = this::reportCloseToParentReaders; - Closeable finalizer1 = this::notifyReaderClosedListeners) { + try (@SuppressWarnings("unused") + Closeable finalizer = this::reportCloseToParentReaders; + @SuppressWarnings("unused") + Closeable finalizer1 = this::notifyReaderClosedListeners) { Review Comment: Yeah, that seem the most likely reason. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Bump minimum required Java version to 23 [lucene]
rmuir commented on PR #14302: URL: https://github.com/apache/lucene/pull/14302#issuecomment-2687423920 test failure on mac looked like a good ole flaky test to me, I re-reran. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Bump minimum required Java version to 23 [lucene]
rmuir commented on PR #14302: URL: https://github.com/apache/lucene/pull/14302#issuecomment-2687466537 The same test failed again, this time on windows. To me at a glance, this looks to be unrelated issue caused by #14294 cc @jpountz -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Bump minimum required Java version to 23 [lucene]
ChrisHegarty commented on PR #14302: URL: https://github.com/apache/lucene/pull/14302#issuecomment-2687487855 > The same test failed again, this time on windows. > > To me at a glance, this looks to be unrelated issue caused by #14294 ha! I had to merge main, since I didn't have this test in the branch! I can reproduce it now. Going to file a separate issue and mute the test, for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[I] TestScorerUtil.testLikelyImpactsEnum fails [lucene]
ChrisHegarty opened a new issue, #14303: URL: https://github.com/apache/lucene/issues/14303 ### Description ``` Reproduce with: gradlew :lucene:core:test --tests \ "org.apache.lucene.search.TestScorerUtil.testLikelyImpactsEnum" \ -Ptests.jvms=1 \ -Ptests.jvmargs= \ -Ptests.seed=20E0C716D3EB48E7 \ -Ptests.useSecurityManager=true \ -Ptests.gui=true \ -Ptests.file.encoding=UTF-8 \ -Ptests.vectorsize=512 -Ptests.forceintegervectors=true ``` ``` TestScorerUtil > testLikelyImpactsEnum FAILED java.lang.AssertionError: expected same: was not: at __randomizedtesting.SeedInfo.seed([20E0C716D3EB48E7:AC95CC1E9285FC36]:0) at org.junit.Assert.fail(Assert.java:89) at org.junit.Assert.failNotSame(Assert.java:829) at org.junit.Assert.assertSame(Assert.java:772) at org.junit.Assert.assertSame(Assert.java:783) at org.apache.lucene.search.TestScorerUtil.testLikelyImpactsEnum(TestScorerUtil.java:91) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) at java.base/java.lang.reflect.Method.invoke(Method.java:580) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946) at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982) at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996) at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48) at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45) at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60) at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902) at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53) at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44) at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60) at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390) at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850) at java.base/
Re: [I] TestScorerUtil.testLikelyImpactsEnum fails [lucene]
ChrisHegarty commented on issue #14303: URL: https://github.com/apache/lucene/issues/14303#issuecomment-2687502442 The test was introduced by #14294 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] ExceptionInInitializerError in ScorerUtil [lucene]
uschindler commented on PR #14280: URL: https://github.com/apache/lucene/pull/14280#issuecomment-2688104449 This is better. But best would be to add a getter for the class in our internal package (like we do for other stuff). The usual SharedSecrets approach. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] ExceptionInInitializerError in ScorerUtil [lucene]
uschindler commented on PR #14280: URL: https://github.com/apache/lucene/pull/14280#issuecomment-2688112479 But this is fine. We currently only have shared secrets for tests. Making that class public does not hurt. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add posTagFormat parameter for OpenNLPPOSFilter [lucene]
epugh commented on PR #14194: URL: https://github.com/apache/lucene/pull/14194#issuecomment-2688311636 @cpoerschke (and anyone else) I haven't done a commit on Lucene in a long time so I want to get another set of eyes on this..And I need to remember what all is in the workflow as well ;-) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Should we auto-adjust top score doc and top field collector manager based on slices? [lucene]
javanna commented on issue #13791: URL: https://github.com/apache/lucene/issues/13791#issuecomment-2688326822 I will go ahead and close this. The supportsConcurrency flag has been removed. The collector manager no longer gives a choice and users don't have to think about it either. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Support JDK 24 in Panama Vectorization Provider [lucene]
ChrisHegarty merged PR #14300: URL: https://github.com/apache/lucene/pull/14300 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Fix optimization to help inline calls to live docs. [lucene]
jpountz merged PR #14294: URL: https://github.com/apache/lucene/pull/14294 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Use DenseConjunctionBulkScorer for single queries sometimes. [lucene]
jpountz commented on code in PR #14293: URL: https://github.com/apache/lucene/pull/14293#discussion_r1973119540 ## lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java: ## @@ -341,11 +341,10 @@ public ScorerSupplier scorerSupplier(LeafReaderContext context) throws IOExcepti if (allDocsMatch) { // all docs have a value and all points are within bounds, so everything matches - return new ScorerSupplier() { + return new ConstantScoreScorerSupplier(score(), scoreMode, reader.maxDoc()) { Review Comment: Oops, yes, fixing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Support JDK 24 [lucene]
ChrisHegarty commented on issue #14184: URL: https://github.com/apache/lucene/issues/14184#issuecomment-2687262361 The JDK 24 RC builds are tested with Lucene very frequently [1], and there are no observable issues. [1] https://jenkins.thetaphi.de/view/Lucene/job/Lucene-main-Linux/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Support JDK 24 [lucene]
ChrisHegarty closed issue #14184: Support JDK 24 URL: https://github.com/apache/lucene/issues/14184 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Use DenseConjunctionBulkScorer for single queries sometimes. [lucene]
jpountz commented on code in PR #14293: URL: https://github.com/apache/lucene/pull/14293#discussion_r1973122492 ## lucene/core/src/java/org/apache/lucene/search/TermQuery.java: ## @@ -165,6 +165,17 @@ public Scorer get(long leadCost) throws IOException { } } +@Override +public BulkScorer bulkScorer() throws IOException { + if (scoreMode.needsScores() == false) { +DocIdSetIterator iterator = get(Long.MAX_VALUE).iterator(); +int maxDoc = context.reader().maxDoc(); +return ConstantScoreScorerSupplier.fromIterator(iterator, 0f, scoreMode, maxDoc) Review Comment: I think so, but I plan on having a potentially better way of doing it by reviving #12194 as a follow-up and skipping clauses that fully match a window (which would also remove the need for `MatchAllScorerSupplier`). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] ExceptionInInitializerError in ScorerUtil [lucene]
jpountz commented on PR #14280: URL: https://github.com/apache/lucene/pull/14280#issuecomment-2687278510 One concern I have with this change is that this code now runs as part of evaluating a query, when users care about query latency. @uschindler You usually have informed opinions on this sort of things, I wonder if you have thoughts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Remove scoreAll() optimization from DefaultBulkScorer. [lucene]
jpountz commented on code in PR #14039: URL: https://github.com/apache/lucene/pull/14039#discussion_r1973145270 ## lucene/core/src/java/org/apache/lucene/search/Weight.java: ## @@ -289,75 +262,108 @@ static int scoreRange( } } - int doc = iterator.docID(); - if (doc < min) { -if (doc == min - 1) { - doc = iterator.nextDoc(); + if (iterator.docID() < min) { +if (iterator.docID() == min - 1) { + iterator.nextDoc(); } else { - doc = iterator.advance(min); + iterator.advance(min); } } + // These various specializations help save some null checks in a hot loop, but as importantly + // if not more importantly, they help reduce the polymorphism of calls sites to nextDoc() and + // collect() because only a subset of collectors produce a competitive iterator, and the set + // of implementing classes for two-phase approximations is smaller than the set of doc id set + // iterator implementations. if (twoPhase == null && competitiveIterator == null) { // Optimize simple iterators with collectors that can't skip -while (doc < max) { - if (acceptDocs == null || acceptDocs.get(doc)) { -collector.collect(doc); - } - doc = iterator.nextDoc(); -} +scoreIterator(collector, acceptDocs, iterator, max); + } else if (competitiveIterator == null) { +scoreTwoPhaseIterator(collector, acceptDocs, iterator, twoPhase, max); + } else if (twoPhase == null) { +scoreCompetitiveIterator(collector, acceptDocs, iterator, competitiveIterator, max); } else { -while (doc < max) { - if (competitiveIterator != null) { -assert competitiveIterator.docID() <= doc; -if (competitiveIterator.docID() < doc) { - competitiveIterator.advance(doc); -} -if (competitiveIterator.docID() != doc) { - doc = iterator.advance(competitiveIterator.docID()); - continue; -} - } +scoreTwoPhaseOrCompetitiveIterator( +collector, acceptDocs, iterator, twoPhase, competitiveIterator, max); + } - if ((acceptDocs == null || acceptDocs.get(doc)) - && (twoPhase == null || twoPhase.matches())) { -collector.collect(doc); - } - doc = iterator.nextDoc(); + return iterator.docID(); +} + +private static void scoreIterator( +LeafCollector collector, Bits acceptDocs, DocIdSetIterator iterator, int max) +throws IOException { + for (int doc = iterator.docID(); doc < max; doc = iterator.nextDoc()) { +if (acceptDocs == null || acceptDocs.get(doc)) { + collector.collect(doc); } } - - return doc; } -/** - * Specialized method to bulk-score all hits; we separate this from {@link #scoreRange} to help - * out hotspot. See https://issues.apache.org/jira/browse/LUCENE-5487";>LUCENE-5487 - */ -static void scoreAll( +private static void scoreTwoPhaseIterator( LeafCollector collector, +Bits acceptDocs, DocIdSetIterator iterator, TwoPhaseIterator twoPhase, -Bits acceptDocs) +int max) throws IOException { - if (twoPhase == null) { -for (int doc = iterator.nextDoc(); -doc != DocIdSetIterator.NO_MORE_DOCS; -doc = iterator.nextDoc()) { - if (acceptDocs == null || acceptDocs.get(doc)) { -collector.collect(doc); + for (int doc = iterator.docID(); doc < max; ) { +if ((acceptDocs == null || acceptDocs.get(doc)) && twoPhase.matches()) { + collector.collect(doc); +} + +doc = iterator.nextDoc(); Review Comment: Good catch, this is because this was copied from a more complex version of this code (the one that intersects with a collector's competitive iterator) that used to call `advance` on the iterator and then `continue`, so nextDoc() could not be part of the for-loop control statement or the iterator would nextDoc() right after advance(). But this one doesn't need to advance, so it can. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Use DenseConjunctionBulkScorer for single queries sometimes. [lucene]
jpountz commented on PR #14293: URL: https://github.com/apache/lucene/pull/14293#issuecomment-2687300793 Yes, especially with queries that match long ranges of doc IDs by design, such as those that take advantage of sparse indexing. > For reminding, I think we also need a CHANGES entry :) Good point, added. :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Evaluate bumping the minimum compile Java version [lucene]
ChrisHegarty commented on issue #14229: URL: https://github.com/apache/lucene/issues/14229#issuecomment-2687327575 I opened [#14302](https://github.com/apache/lucene/pull/14302) for the initial bump to Java 23. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Bump minimum required Java version to 23 [lucene]
ChrisHegarty opened a new pull request, #14302: URL: https://github.com/apache/lucene/pull/14302 This commit bumps minimum required Java version to 23. The _main_ branch is accumulating changes for the next major release, Lucene 11.0.0. The intent is to release Lucene 11.0.0 with a minimum of Java 25. We'll continually bump the minimum Java release in the _main_ branch, as and when newer Java versions are become available, until we hit Java 25. We do this intentionally so that Lucene developers can take advantage of newer Java features sooner. This change is only intended for the _main_ branch, and will not be backported. relates #14229 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Bump minimum required Java version to 23 [lucene]
ChrisHegarty commented on PR #14302: URL: https://github.com/apache/lucene/pull/14302#issuecomment-2687333833 > ``` > > Task :lucene:core.tests:ecjLintMain FAILED > source level should be in '1.1'...'1.8','9'...'21' (or '5.0'..'21.0'): 23 > ``` > > ECJ is failing, perhaps it needs to be updated as well. ++ yeap! I'll take a look for a newer version of ECJ. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Bump minimum required Java version to 23 [lucene]
dweiss commented on PR #14302: URL: https://github.com/apache/lucene/pull/14302#issuecomment-2687329916 ``` > Task :lucene:core.tests:ecjLintMain FAILED source level should be in '1.1'...'1.8','9'...'21' (or '5.0'..'21.0'): 23 ``` ECJ is failing, perhaps it needs to be updated as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Bump minimum required Java version to 23 [lucene]
dweiss commented on PR #14302: URL: https://github.com/apache/lucene/pull/14302#issuecomment-2687333543 All the way up to 3.40.0 in versions.toml: ``` ecj = "3.36.0" ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Address completion fields testing gap and truly allow loading FST off heap [lucene]
javanna commented on PR #14270: URL: https://github.com/apache/lucene/pull/14270#issuecomment-2687334196 That's good with me. Shall we clearly document this then as a follow-up, and shall I make the fst load mode static field package private perhaps, so that we at least test both modes? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Reciprocal Rank Fusion (RRF) in TopDocs [lucene]
javanna commented on code in PR #13470: URL: https://github.com/apache/lucene/pull/13470#discussion_r1973180965 ## lucene/core/src/java/org/apache/lucene/search/TopDocs.java: ## @@ -350,4 +354,89 @@ private static TopDocs mergeAux( return new TopFieldDocs(totalHits, hits, sort.getSort()); } } + + private record ShardIndexAndDoc(int shardIndex, int doc) {} + + /** + * Reciprocal Rank Fusion method. + * + * This method combines different search results into a single ranked list by combining their + * ranks. This is especially well suited when combining hits computed via different methods, whose + * score distributions are hardly comparable. + * + * @param topN the top N results to be returned + * @param k a constant determines how much influence documents in individual rankings have on the + * final result. A higher value gives lower rank documents more influence. k should be greater + * than or equal to 1. + * @param hits a list of TopDocs to apply RRF on + * @return a TopDocs contains the top N ranked results. + */ + public static TopDocs rrf(int topN, int k, TopDocs[] hits) { +if (topN < 1) { + throw new IllegalArgumentException("topN must be >= 1, got " + topN); +} +if (k < 1) { + throw new IllegalArgumentException("k must be >= 1, got " + k); +} + +boolean shardIndexSet = false; +outer: +for (TopDocs topDocs : hits) { + for (ScoreDoc scoreDoc : topDocs.scoreDocs) { +shardIndexSet = scoreDoc.shardIndex != -1; +break outer; Review Comment: yes, thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Bump minimum required Java version to 23 [lucene]
ChrisHegarty merged PR #14302: URL: https://github.com/apache/lucene/pull/14302 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] ExceptionInInitializerError in ScorerUtil [lucene]
uschindler commented on PR #14280: URL: https://github.com/apache/lucene/pull/14280#issuecomment-2687564614 I get crazy when I see this code (before and after). 😜 In general I would rewrite that in a different way to not use a test index to initialize the code. Can't we figure out what the class is in a better way, so the initializer does not need to do interruptible things? About the initialization and the code: There are multiple issues like missing synchronization. It may not be an issue but it's unfortunately an anti-pattern. The problem is also that the optimizer cannot assume that the returned value is constant. Finally: Lucene explicitly says that interrupting a search thread is not supported and may cause other havoc. -1 to merge this. Better rewrite the code not not fo crazy stuff in an static initialization block. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Bump Lucene 11.0.0 minimum required Java version to 25 [lucene]
ChrisHegarty commented on issue #14229: URL: https://github.com/apache/lucene/issues/14229#issuecomment-2687571909 ... we're on the train now. I repurposed this issue to track the upgrade to Java 25. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] ExceptionInInitializerError in ScorerUtil [lucene]
uschindler commented on PR #14280: URL: https://github.com/apache/lucene/pull/14280#issuecomment-2687575659 Generally the issue here could be solved using this JEP which is long awaited: https://openjdk.org/jeps/8209964 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Create vectorized versions of ScalarQuantizer.quantize and recalculateCorrectiveOffset [lucene]
thecoop opened a new pull request, #14304: URL: https://github.com/apache/lucene/pull/14304 This resolves #13922 JMH shows a ~5x speedup: ``` Benchmark Mode Cnt ScoreError Units Quant.quantizethrpt5 231.147 ± 13.401 ops/ms Quant.quantizeVector thrpt5 1234.446 ± 49.961 ops/ms ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Support load per-iteration replacement of NamedSPI [lucene]
ChrisHegarty commented on PR #14275: URL: https://github.com/apache/lucene/pull/14275#issuecomment-2687787710 @jpountz given the connection of this PR with completion FST, do you have opinions here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] develocity build scans fail to upload sometimes [lucene]
dweiss commented on issue #14305: URL: https://github.com/apache/lucene/issues/14305#issuecomment-2687815179 Example:  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[I] develocity build scans fail to upload sometimes [lucene]
dweiss opened a new issue, #14305: URL: https://github.com/apache/lucene/issues/14305 ### Description Seems like they're expiring before the build is completed. Relevant links: https://issues.apache.org/jira/browse/INFRA-26057 https://github.com/gradle/actions/blob/main/docs/setup-gradle.md#managing-develocity-access-keys -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] TestScorerUtil.testLikelyImpactsEnum fails [lucene]
jpountz commented on issue #14303: URL: https://github.com/apache/lucene/issues/14303#issuecomment-2688040417 It looks like this is due to codec randomization getting applied before the static block runs. I opened #14306. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Make BlockPostingsEnum public. [lucene]
jpountz opened a new pull request, #14306: URL: https://github.com/apache/lucene/pull/14306 This allows access from `ScorerUtil` so that it no longer needs a static block that creates an index to be able to introspect what implementation is used for impacts. Closes #14303 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] ExceptionInInitializerError in ScorerUtil [lucene]
jpountz commented on PR #14280: URL: https://github.com/apache/lucene/pull/14280#issuecomment-2688043954 It looks like another problem with the current code is that codec randomization may randomly run before this block of code, making it unreliable when running tests. I opened #14306 as an alternative. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org