[PR] Move HitQueue in TopScoreDocCollector to a LongHeap [lucene]

2025-05-26 Thread via GitHub
gf2121 opened a new pull request, #14714: URL: https://github.com/apache/lucene/pull/14714 This tries to encode `ScoreDoc#score` and `ScoreDoc#doc` to a comparable long and use a `LongHeap` instead of `HitQueue`. This seems to help apparently when i increase `topN = 1000` (https://github.c

[I] Potential resource leakage in WordDictionary#loadMainDataFromFile [lucene]

2025-05-26 Thread via GitHub
xcx1r3 opened a new issue, #14719: URL: https://github.com/apache/lucene/issues/14719 ### Description In `org.apache.lucene.analysis.cn.smart.hhmm.WordDictionary#loadMainDataFromFile`, the resouce `DataInputStream dctFile` could be leaked upon exception. It can be fixed by using t

Re: [PR] [WIP] Multi-Vector support for HNSW search [lucene]

2025-05-26 Thread via GitHub
vigyasharma commented on PR #13525: URL: https://github.com/apache/lucene/pull/13525#issuecomment-2911279145 > if our goal were limited to enabling MAXSIM scoring between two multi-vectors using only exact search (i.e., without KNN indexing for faster retrieval), many of these challenges co

Re: [PR] Add comment about using InverseIntersectVisit and IntersectVisitor. [lucene]

2025-05-26 Thread via GitHub
github-actions[bot] commented on PR #14647: URL: https://github.com/apache/lucene/pull/14647#issuecomment-2910809175 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Shrink FuzzySet public API surface area [lucene]

2025-05-26 Thread via GitHub
github-actions[bot] commented on PR #14615: URL: https://github.com/apache/lucene/pull/14615#issuecomment-2910809201 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [I] Support Vector Search on GPU with cuVS [lucene]

2025-05-26 Thread via GitHub
weizijun commented on issue #14243: URL: https://github.com/apache/lucene/issues/14243#issuecomment-2911007740 hi, @ChrisHegarty, @chatman, how is the project progressing? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [I] A little optimization about BKDReader [lucene]

2025-05-26 Thread via GitHub
gf2121 commented on issue #14717: URL: https://github.com/apache/lucene/issues/14717#issuecomment-2911206727 Is this a duplicate of https://github.com/apache/lucene/pull/14244? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[PR] Fix TestBooleanMinShouldMatch#testRandomQueries failure. [lucene]

2025-05-26 Thread via GitHub
jpountz opened a new pull request, #14715: URL: https://github.com/apache/lucene/pull/14715 This test generates random boolean queries and ensures that setting a minimum number of matching SHOULD clauses returns a subset of the hits with the same scores. It already tries to work arou

Re: [PR] Better vectorize score computations. [lucene]

2025-05-26 Thread via GitHub
jpountz commented on PR #14704: URL: https://github.com/apache/lucene/pull/14704#issuecomment-2909588910 I was looking at it too as I was worried it may have been caused by recent optimizations, but it looks like an old and rare failure. I opened a tentative test fix at https://github.com/a

Re: [PR] Fix TestBooleanMinShouldMatch#testRandomQueries failure. [lucene]

2025-05-26 Thread via GitHub
github-actions[bot] commented on PR #14715: URL: https://github.com/apache/lucene/pull/14715#issuecomment-2909588277 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you wil

Re: [PR] Speed up TermQuery [lucene]

2025-05-26 Thread via GitHub
jpountz commented on code in PR #14709: URL: https://github.com/apache/lucene/pull/14709#discussion_r2107247589 ## lucene/core/src/java/org/apache/lucene/search/TermScorer.java: ## @@ -134,6 +134,10 @@ public void nextDocsAndScores(int upTo, Bits liveDocs, DocAndScoreBuffer buf

Re: [PR] Improve BaseRangeFieldQueryTestCase#verify failure output [lucene]

2025-05-26 Thread via GitHub
timgrein closed pull request #13382: Improve BaseRangeFieldQueryTestCase#verify failure output URL: https://github.com/apache/lucene/pull/13382 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Rewrite LongRange.ValueSourceQuery/MultiValueSourceQuery to FieldExistsQuery on max range [lucene]

2025-05-26 Thread via GitHub
timgrein closed pull request #13383: Rewrite LongRange.ValueSourceQuery/MultiValueSourceQuery to FieldExistsQuery on max range URL: https://github.com/apache/lucene/pull/13383 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Speed up TermQuery [lucene]

2025-05-26 Thread via GitHub
gf2121 merged PR #14709: URL: https://github.com/apache/lucene/pull/14709 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] Optimize AbstractKnnVectorQuery#createBitSet with intoBitset [lucene]

2025-05-26 Thread via GitHub
github-actions[bot] commented on PR #14674: URL: https://github.com/apache/lucene/pull/14674#issuecomment-2909819796 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you wil

Re: [I] Nightly benchmark regression on 2025.05.01 [lucene]

2025-05-26 Thread via GitHub
mikemccand commented on issue #14630: URL: https://github.com/apache/lucene/issues/14630#issuecomment-2909802834 It looks like this `CONFIG_HZ` change also impacted KNN indexing throughput: https://benchmarks.mikemccandless.com/knnResults.html -- This is an automated message from the Apac

Re: [PR] Optimize AbstractKnnVectorQuery#createBitSet with intoBitset [lucene]

2025-05-26 Thread via GitHub
gf2121 merged PR #14674: URL: https://github.com/apache/lucene/pull/14674 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] DocIdRunEnd implementation missed in Lucene103PostingsReader [lucene]

2025-05-26 Thread via GitHub
gf2121 merged PR #14693: URL: https://github.com/apache/lucene/pull/14693 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

[PR] Merge PostingsEnum and ImpactsEnum. [lucene]

2025-05-26 Thread via GitHub
jpountz opened a new pull request, #14716: URL: https://github.com/apache/lucene/pull/14716 When impacts were introduced, I tried to avoid touching `PostingsEnum`, which had a relatively stable API while impacts were rather experimental. Impacts have been quite successful, so I think it's n

Re: [PR] Merge PostingsEnum and ImpactsEnum. [lucene]

2025-05-26 Thread via GitHub
github-actions[bot] commented on PR #14716: URL: https://github.com/apache/lucene/pull/14716#issuecomment-2909949846 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you wil

[I] A little optimization about BKDReader [lucene]

2025-05-26 Thread via GitHub
kkewwei opened a new issue, #14717: URL: https://github.com/apache/lucene/issues/14717 ### Description 1. Reduce the match times for BKD node whose relation=`CELL_CROSSES_QUERY`. When performing range query, if the Relation is `CELL_CROSSES_QUERY`, we need to traverse each value in

[PR] Optimizing visiting BKD tree [lucene]

2025-05-26 Thread via GitHub
kkewwei opened a new pull request, #14718: URL: https://github.com/apache/lucene/pull/14718 ### Description Issue: #14717 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Optimizing visiting BKD tree [lucene]

2025-05-26 Thread via GitHub
github-actions[bot] commented on PR #14718: URL: https://github.com/apache/lucene/pull/14718#issuecomment-2910251551 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you wil

Re: [PR] Optimizing visiting BKD tree [lucene]

2025-05-26 Thread via GitHub
github-actions[bot] commented on PR #14718: URL: https://github.com/apache/lucene/pull/14718#issuecomment-2910252984 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you wil

Re: [PR] Optimizing visiting BKD tree [lucene]

2025-05-26 Thread via GitHub
github-actions[bot] commented on PR #14718: URL: https://github.com/apache/lucene/pull/14718#issuecomment-2910255086 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you wil

Re: [I] A little optimization about BKDReader [lucene]

2025-05-26 Thread via GitHub
iverase commented on issue #14717: URL: https://github.com/apache/lucene/issues/14717#issuecomment-2910292755 This approach is only considering one dimensional data with queries represented by simple ranges. I don't think this will work for high dimensional data, e.g LatLonShape and complex

Re: [PR] Use github wf to add module labels for PR based on file changes [lucene]

2025-05-26 Thread via GitHub
dweiss commented on PR #14101: URL: https://github.com/apache/lucene/pull/14101#issuecomment-2910623550 I have a question - I have a fork of Lucene and create a pull request against that fork (not against the primary repo). This labeler action then fails because it - for some reason - lacks

Re: [PR] Speed up TermQuery [lucene]

2025-05-26 Thread via GitHub
gf2121 commented on code in PR #14709: URL: https://github.com/apache/lucene/pull/14709#discussion_r2107337418 ## lucene/core/src/java/org/apache/lucene/search/TermScorer.java: ## @@ -134,6 +134,10 @@ public void nextDocsAndScores(int upTo, Bits liveDocs, DocAndScoreBuffer buff

Re: [PR] Move HitQueue in TopScoreDocCollector to a LongHeap [lucene]

2025-05-26 Thread via GitHub
github-actions[bot] commented on PR #14714: URL: https://github.com/apache/lucene/pull/14714#issuecomment-2908907735 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you wil

Re: [PR] Speed up TermQuery [lucene]

2025-05-26 Thread via GitHub
github-actions[bot] commented on PR #14709: URL: https://github.com/apache/lucene/pull/14709#issuecomment-2909718785 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you wil