Re: [PR] Stop using `SlowImpactsEnum` for terms whose `docFreq` is less than 128. [lucene]

2024-11-25 Thread via GitHub
jpountz commented on PR #14017: URL: https://github.com/apache/lucene/pull/14017#issuecomment-2497337140 With Combined tasks in the file (call is 3-polymorphic in the baseline, bimorphic in the modified version): ``` TaskQPS baseline StdDevQPS my_m

[PR] Stop using `SlowImpactsEnum` for terms whose `docFreq` is less than 128. [lucene]

2024-11-25 Thread via GitHub
jpountz opened a new pull request, #14017: URL: https://github.com/apache/lucene/pull/14017 We currently use `SlowImpactsEnum` for terms whose `docFreq` is less than 128 because it's convenient as these terms don't have impacts anyway. But a recent slowdown on nightly benchmarks suggests th

Re: [PR] Make CombinedFieldQuery eligible for WAND/MAXSCORE. [lucene]

2024-11-25 Thread via GitHub
jpountz commented on PR #13999: URL: https://github.com/apache/lucene/pull/13999#issuecomment-2497242523 I opened #14017. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] Better encapsulate locking logic in HnswGraphBuilder [lucene]

2024-11-25 Thread via GitHub
viliam-durina opened a new pull request, #14016: URL: https://github.com/apache/lucene/pull/14016 This PR moves the locking logic from `HnswConcurrentMergeBuilder` to `HnswGraphBuilder`, which automatically picks the single-threaded vs. concurrent searcher based on whether a lock is used. T

[I] TestSoftDeletesDirectoryReaderWrapper.testAvoidWrappingReadersWithoutSoftDeletes AssertionError: expected:<5> but was:<3> [lucene]

2024-11-25 Thread via GitHub
ChrisHegarty opened a new issue, #14020: URL: https://github.com/apache/lucene/issues/14020 Fails with: `java.lang.AssertionError: expected:<5> but was:<3>` Reproduces with: ``` ./gradlew test --tests TestSoftDeletesDirectoryReaderWrapper.testAvoidWrappingReadersWithoutSoftDelet

Re: [PR] Update lastDoc in ScoreCachingWrappingScorer [lucene]

2024-11-25 Thread via GitHub
msfroh commented on PR #13987: URL: https://github.com/apache/lucene/pull/13987#issuecomment-2498931094 Closing in favor of https://github.com/apache/lucene/pull/14012 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] Update lastDoc in ScoreCachingWrappingScorer [lucene]

2024-11-25 Thread via GitHub
msfroh closed pull request #13987: Update lastDoc in ScoreCachingWrappingScorer URL: https://github.com/apache/lucene/pull/13987 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] [Discuss] Reducing allocations in HnswUtil::markRooted [lucene]

2024-11-25 Thread via GitHub
viswanathk commented on issue #14002: URL: https://github.com/apache/lucene/issues/14002#issuecomment-2498525370 > Interesting. The "theoretical" max depth of this stack would be the size of this graph. I suppose the stack does get large, which would explain a high no. of Array::grow calls?

Re: [PR] Simplify logic in ScoreCachingWrappingScorer [lucene]

2024-11-25 Thread via GitHub
msfroh commented on PR #14012: URL: https://github.com/apache/lucene/pull/14012#issuecomment-2499037808 Thanks folks! I made the suggested change and took this PR out of draft. I think this one makes more sense than my previous one over at https://github.com/apache/lucene/pull/13987

[PR] Make WANDScorer compute scores on the fly. [lucene]

2024-11-25 Thread via GitHub
jpountz opened a new pull request, #14021: URL: https://github.com/apache/lucene/pull/14021 Currently, `WANDSCorer` considers that a hit is a match if the sum of maximum scores across clauses is more than or equal to the minimum competitive score. We can do better by computing scores of lea

Re: [I] How to configure TieredMergePolicy for very low segment count? [lucene]

2024-11-25 Thread via GitHub
jpountz commented on issue #14004: URL: https://github.com/apache/lucene/issues/14004#issuecomment-2497680059 > Or does that 50% check trump the flooring? It does trump the flooring indeed. The reasoning is that even if a user is happy to spend lots of hardware resources on merging, t

Re: [PR] Make WANDScorer compute scores on the fly. [lucene]

2024-11-25 Thread via GitHub
jpountz commented on PR #14021: URL: https://github.com/apache/lucene/pull/14021#issuecomment-2497708293 ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value OrStopWords 33.45 (6.

[PR] Fix changelog for GITHUB#14011 [lucene]

2024-11-25 Thread via GitHub
viliam-durina opened a new pull request, #14018: URL: https://github.com/apache/lucene/pull/14018 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

Re: [I] How to configure TieredMergePolicy for very low segment count? [lucene]

2024-11-25 Thread via GitHub
jpountz commented on issue #14004: URL: https://github.com/apache/lucene/issues/14004#issuecomment-2498203861 > But this doesn't seem intrinsic/important -- +1 to allow it to pick a range of segments to merge at once? I ran some quick simulations with maxMergeAtOnce > segsPerTier and

Re: [PR] Stop using `SlowImpactsEnum` for terms whose `docFreq` is less than 128. [lucene]

2024-11-25 Thread via GitHub
jpountz merged PR #14017: URL: https://github.com/apache/lucene/pull/14017 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Improve checksum calculations [lucene]

2024-11-25 Thread via GitHub
jpountz merged PR #13989: URL: https://github.com/apache/lucene/pull/13989 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Improve checksum calculations [lucene]

2024-11-25 Thread via GitHub
jpountz commented on PR #13989: URL: https://github.com/apache/lucene/pull/13989#issuecomment-2498257183 Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [PR] Update lastDoc in ScoreCachingWrappingScorer [lucene]

2024-11-25 Thread via GitHub
jpountz commented on PR #13987: URL: https://github.com/apache/lucene/pull/13987#issuecomment-2498260663 > In that case, the unit test that I added can be removed This works for me. Sorry for putting you on the wrong track by suggesting that a test is added, it took me a while to unde

Re: [I] AES Encrypted Directory [LUCENE-2228] [lucene]

2024-11-25 Thread via GitHub
aleboulanger commented on issue #3304: URL: https://github.com/apache/lucene/issues/3304#issuecomment-2498305210 any news please ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Simplify logic in ScoreCachingWrappingScorer [lucene]

2024-11-25 Thread via GitHub
jpountz commented on code in PR #14012: URL: https://github.com/apache/lucene/pull/14012#discussion_r1856772113 ## lucene/core/src/java/org/apache/lucene/search/ScoreCachingWrappingScorer.java: ## @@ -31,8 +31,7 @@ */ public final class ScoreCachingWrappingScorer extends Scor

Re: [PR] Make CombinedFieldQuery eligible for WAND/MAXSCORE. [lucene]

2024-11-25 Thread via GitHub
jpountz commented on PR #13999: URL: https://github.com/apache/lucene/pull/13999#issuecomment-2497224592 So it went from bimorphic to polymorphic indeed, we had 2 iterator impls: `Lucene101PostingsReader$BlockImpactsDocsEnum` and `SlowImpactsEnum` and we added `DisjunctionDISIApproximation`

Re: [PR] Add Query for reranking KnnFloatVectorQuery [lucene]

2024-11-25 Thread via GitHub
dungba88 commented on PR #14009: URL: https://github.com/apache/lucene/pull/14009#issuecomment-2499786440 I have a preliminary benchmark here (top-k=100, fanout=0) using Cohere 768 dataset. ![image](https://github.com/user-attachments/assets/d40fdc53-019b-4515-bff4-a29162d9b9da)

Re: [PR] Add Query for reranking KnnFloatVectorQuery [lucene]

2024-11-25 Thread via GitHub
shatejas commented on code in PR #14009: URL: https://github.com/apache/lucene/pull/14009#discussion_r1857976423 ## lucene/core/src/java/org/apache/lucene/search/RerankKnnFloatVectorQuery.java: ## @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o