Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

2024-10-22 Thread via GitHub
msokolov commented on PR #13872: URL: https://github.com/apache/lucene/pull/13872#issuecomment-2430042116 With the most recent commit I saw these luceneutil/knnPerfTest.py results: ## 1. baseline ``` recall latency (ms) nDoc topK fanout maxConn beamWidth quantized ind

Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

2024-10-22 Thread via GitHub
msokolov commented on code in PR #13872: URL: https://github.com/apache/lucene/pull/13872#discussion_r1811235616 ## lucene/core/src/java/org/apache/lucene/codecs/hnsw/DefaultFlatVectorScorer.java: ## @@ -88,34 +88,28 @@ public String toString() { /** RandomVectorScorerSuppl

[PR] Ensure stability of clause order for DisjunctionMaxQuery toString [lucene]

2024-10-22 Thread via GitHub
ljak opened a new pull request, #13944: URL: https://github.com/apache/lucene/pull/13944 Since https://github.com/apache/lucene/pull/110, the disjuncts elements of DisjunctionMaxQueries don't have an order anymore, which is impacting the `toString` method. In isolation, that does not matter

[PR] Remove TopScoreDocCollector's dependency on HitsThresholdChecker. [lucene]

2024-10-22 Thread via GitHub
jpountz opened a new pull request, #13943: URL: https://github.com/apache/lucene/pull/13943 `TopScoreDocCollectorManager` has a dependency on `HitsThresholdChecker`, which is essentially a shared counter that is incremented until it reaches the total hits threshold, when the scorer can star

Re: [PR] Remove TopScoreDocCollector's dependency on HitsThresholdChecker. [lucene]

2024-10-22 Thread via GitHub
jpountz commented on PR #13943: URL: https://github.com/apache/lucene/pull/13943#issuecomment-2429765576 wikibigall with a `searchConcurrency` of 8 suggests that the slowdown is tiny: ``` TaskQPS baseline StdDevQPS my_modified_version StdDev

Re: [PR] Add BaseKnnVectorsFormatTestCase.testRecall() and fix old codecs [lucene]

2024-10-22 Thread via GitHub
msokolov commented on PR #13910: URL: https://github.com/apache/lucene/pull/13910#issuecomment-2429836870 Yes, maybe we should -- I think it would be a one-liner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Add BaseKnnVectorsFormatTestCase.testRecall() and fix old codecs [lucene]

2024-10-22 Thread via GitHub
msokolov commented on PR #13910: URL: https://github.com/apache/lucene/pull/13910#issuecomment-2429841476 There is another upgrade path -- if you started with 9.0 and then "upgraded" your index by rewriting it (eg with IndexUpdater tool) via merge to 9.1-9.7 you could subsequently read the

Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

2024-10-22 Thread via GitHub
msokolov commented on code in PR #13872: URL: https://github.com/apache/lucene/pull/13872#discussion_r1811216599 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/OffHeapQuantizedByteVectorValues.java: ## @@ -127,31 +121,42 @@ public int size() { } @Override - p

Re: [PR] Add BaseKnnVectorsFormatTestCase.testRecall() and fix old codecs [lucene]

2024-10-22 Thread via GitHub
benwtrent commented on PR #13910: URL: https://github.com/apache/lucene/pull/13910#issuecomment-2429748958 @msokolov could we do a simpler patch for 9.12.1? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] Should we avoid allocating a byte[] upfront for binary doc values [lucene]

2024-10-22 Thread via GitHub
iverase closed issue #13929: Should we avoid allocating a byte[] upfront for binary doc values URL: https://github.com/apache/lucene/issues/13929 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] Should we avoid allocating a byte[] upfront for binary doc values [lucene]

2024-10-22 Thread via GitHub
iverase commented on issue #13929: URL: https://github.com/apache/lucene/issues/13929#issuecomment-2429888740 I really wish our binary doc values didn't imply that you need to have everything on heap in order to read them, it feels wrong. But anyway, I understand I won't happen easil

Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

2024-10-22 Thread via GitHub
msokolov commented on code in PR #13872: URL: https://github.com/apache/lucene/pull/13872#discussion_r1811229378 ## lucene/core/src/java21/org/apache/lucene/internal/vectorization/Lucene99MemorySegmentByteVectorScorerSupplier.java: ## @@ -112,20 +96,20 @@ static final class Cosi

[PR] Removing the deprecated parameters, -fast, -slow, -crossCheckTermVectors from CheckIndex. [lucene]

2024-10-22 Thread via GitHub
slow-J opened a new pull request, #13942: URL: https://github.com/apache/lucene/pull/13942 Removing the deprecated parameters, -fast, -slow, -crossCheckTermVectors from CheckIndex. Their usage is replaced with `-level` with respective values of `1`, `3`, `3`. Follow-up on the depr

Re: [PR] Have value and count in LabelAndValue only for TaxonomyFacets [lucene]

2024-10-22 Thread via GitHub
stefanvodita closed pull request #13740: Have value and count in LabelAndValue only for TaxonomyFacets URL: https://github.com/apache/lucene/pull/13740 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [I] Make CheckIndex doChecksumsOnly / -fast as default [LUCENE-9984] [lucene]

2024-10-22 Thread via GitHub
slow-J commented on issue #11023: URL: https://github.com/apache/lucene/issues/11023#issuecomment-2428849956 I'll clean up the deprecated CheckIndex params in Lucene 11. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Make BooleanScorer work on top of Scorers rather than BulkScorers. [lucene]

2024-10-22 Thread via GitHub
jpountz commented on PR #13931: URL: https://github.com/apache/lucene/pull/13931#issuecomment-2429122034 There is a good speedup on nightly benchmarks too: https://benchmarks.mikemccandless.com/CountOrHighHigh.html. -- This is an automated message from the Apache Git Service. To respond t

Re: [PR] Speedup OrderIntervalsSource some more [lucene]

2024-10-22 Thread via GitHub
jpountz commented on PR #13937: URL: https://github.com/apache/lucene/pull/13937#issuecomment-2429119642 There is indeed a small speedup to intervals with a low p-value. https://benchmarks.mikemccandless.com/IntervalsOrdered.html I pushed an annotation. -- This is an automated message fr

Re: [PR] Reduce the compiled size of the collect() method on `TopScoreDocCollector`. [lucene]

2024-10-22 Thread via GitHub
jpountz merged PR #13939: URL: https://github.com/apache/lucene/pull/13939 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Introduce a heuristic to amortize the per-window overhead in MaxScoreBulkScorer. [lucene]

2024-10-22 Thread via GitHub
jpountz merged PR #13941: URL: https://github.com/apache/lucene/pull/13941 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Add BaseKnnVectorsFormatTestCase.testRecall() and fix old codecs [lucene]

2024-10-22 Thread via GitHub
msokolov commented on PR #13910: URL: https://github.com/apache/lucene/pull/13910#issuecomment-2429428988 ok something like this: Dear Lucene user community, We recently uncovered a backwards compatibility bug that affects indexes created with version 9.0 containing KNN vector

[PR] Introduce a heuristic to amortize the per-window overhead in MaxScoreBulkScorer. [lucene]

2024-10-22 Thread via GitHub
jpountz opened a new pull request, #13941: URL: https://github.com/apache/lucene/pull/13941 It is sometimes possible for `MaxScoreBulkScorer` to compute windows that don't contain many candidate matches, resulting in more time spent evaluating maximum scores per window than evaluating candi