Re: [PR] Dynamic pruning with DocValueSkipper [lucene]

2025-06-04 Thread via GitHub
gf2121 commented on code in PR #14672: URL: https://github.com/apache/lucene/pull/14672#discussion_r2128041444 ## lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java: ## @@ -328,120 +507,47 @@ private void updateSkipInterval(boolean success) {

Re: [PR] Implement IndexedDISI#docIDRunEnd [lucene]

2025-06-04 Thread via GitHub
gf2121 commented on PR #14753: URL: https://github.com/apache/lucene/pull/14753#issuecomment-2942824477 Could you add a CHANGE entry under 9.3.0? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] Move HitQueue in TopScoreDocCollector to a LongHeap [lucene]

2025-06-04 Thread via GitHub
gf2121 commented on PR #14714: URL: https://github.com/apache/lucene/pull/14714#issuecomment-2942786427 > Since the top-k heap appears to be a bottleneck for some queries, we could look into whether a radix heap would perform better than a binary heap in a follow-up. +1, that would b

Re: [PR] Speed up findNextGEQ by aggresive stepping [lucene]

2025-06-04 Thread via GitHub
github-actions[bot] commented on PR #14735: URL: https://github.com/apache/lucene/pull/14735#issuecomment-2942649543 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

Re: [PR] Speed up findNextGEQ by aggresive stepping [lucene]

2025-06-04 Thread via GitHub
HUSTERGS commented on PR #14735: URL: https://github.com/apache/lucene/pull/14735#issuecomment-294260 Apologies if reopening this PR caused any inconvenience. For what it's worth, I came up with a branchless way to avoid the double IntVector check. What I'm curious about is that i

Re: [PR] Speed up findNextGEQ by aggresive stepping [lucene]

2025-06-04 Thread via GitHub
github-actions[bot] commented on PR #14735: URL: https://github.com/apache/lucene/pull/14735#issuecomment-2942651109 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

Re: [PR] Speed up findNextGEQ by aggresive stepping [lucene]

2025-06-04 Thread via GitHub
HUSTERGS commented on PR #14735: URL: https://github.com/apache/lucene/pull/14735#issuecomment-2942382073 For what it's worth, I managed to reduce the double vector check of previous version, basic idea is using the middle value comparison result to produce a mask, so we can check only one

Re: [I] QueryRescorer should be able to use the original sort on ties [lucene]

2025-06-04 Thread via GitHub
HoustonPutman commented on issue #14455: URL: https://github.com/apache/lucene/issues/14455#issuecomment-2941734025 Not sure we should continue with this, but this issue is related to https://github.com/apache/lucene/pull/13510 and that should be documented. -- This is an automated messag

[I] Support multiple HNSW graphs backed by the same vectors [lucene]

2025-06-04 Thread via GitHub
kaivalnp opened a new issue, #14758: URL: https://github.com/apache/lucene/issues/14758 ### Description For use-cases of searching different subsets of vectors in the index, where a non-trivial portion of vectors across fields are overlapping. This could be done today by: 1.

Re: [I] IndexOrDocValuesQuery is counted twice when computing `maxClauseCount` [lucene]

2025-06-04 Thread via GitHub
iverase commented on issue #14756: URL: https://github.com/apache/lucene/issues/14756#issuecomment-2940187059 It is actually even worst in the example I have given. LongField generates an IndexSortSortedNumericDocValuesRangeQuery which will contain the IndexOrDocValuesQuery as a fallback so

Re: [I] Expand TieredMergePolicy deletePctAllowed limits [lucene]

2025-06-04 Thread via GitHub
stefanvodita commented on issue #11761: URL: https://github.com/apache/lucene/issues/11761#issuecomment-2940174046 It's been a few years since this issue was created. In the meantime, we've successfully experimented with lower delete percentage thresholds for Amazon Product Search. Going fr

Re: [PR] Fix too many documents collected when only boo-filter condition is present [lucene]

2025-06-04 Thread via GitHub
github-actions[bot] commented on PR #14757: URL: https://github.com/apache/lucene/pull/14757#issuecomment-2940089156 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

[PR] Fix too many documents collected when only boo-filter condition is present [lucene]

2025-06-04 Thread via GitHub
kkewwei opened a new pull request, #14757: URL: https://github.com/apache/lucene/pull/14757 ### Description Closes: #14755 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[I] IndexOrDocValues query is counted twice when computing `maxClauseCount` [lucene]

2025-06-04 Thread via GitHub
iverase opened a new issue, #14756: URL: https://github.com/apache/lucene/issues/14756 I noticed that if I change a query from a PointRangeQuery to an IndexOrDocValuesQuery, then the change might make previous valid boolean queries to fail because IndexOrDocValuesQuery counts twice when com

Re: [PR] Implement IndexedDISI#docIDRunEnd [lucene]

2025-06-04 Thread via GitHub
HUSTERGS commented on PR #14753: URL: https://github.com/apache/lucene/pull/14753#issuecomment-2939997927 > Thanks for contribution! I wonder if we should also implement `SparsexxxDocValues#DocIdRunEnd` in `Lucene90DocValuesProducer` so that this can be actually used in queries like `FieldE

Re: [PR] Implement IndexedDISI#docIDRunEnd [lucene]

2025-06-04 Thread via GitHub
github-actions[bot] commented on PR #14753: URL: https://github.com/apache/lucene/pull/14753#issuecomment-2939992771 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

[I] Too many documents collected when only boo-filter condition is present [lucene]

2025-06-04 Thread via GitHub
kkewwei opened a new issue, #14755: URL: https://github.com/apache/lucene/issues/14755 ### Description When only boo-filter condition is present, the number of documents collected may exceed the `totalHitsThreshold`, we will not use `ConstantScoreScorer` to prune the `DocIdSetIterat

Re: [PR] Clean up query node classes [lucene]

2025-06-04 Thread via GitHub
stefanvodita merged PR #14737: URL: https://github.com/apache/lucene/pull/14737 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucen

[I] Let's explore a machine-learned MergePolicy? [lucene]

2025-06-04 Thread via GitHub
mikemccand opened a new issue, #14754: URL: https://github.com/apache/lucene/issues/14754 ### Description [The following is a brainstormy kind of idea ... I have no clue how to actually approach it ... patches/ideas very welcome!] Segment merging is a tricky balance between ind

Re: [PR] Move HitQueue in TopScoreDocCollector to a LongHeap [lucene]

2025-06-04 Thread via GitHub
jpountz commented on PR #14714: URL: https://github.com/apache/lucene/pull/14714#issuecomment-2938964435 Since the top-k heap appears to be a bottleneck for some queries, we could look into whether a radix heap would perform better than a binary heap in a follow-up. -- This is an automat