Re: [PR] Implement bulk adding methods for dynamic pruning. [lucene]

2025-03-21 Thread via GitHub
jpountz commented on PR #14365: URL: https://github.com/apache/lucene/pull/14365#issuecomment-2744460409 I pushed an annotation -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Implement bulk adding methods for dynamic pruning. [lucene]

2025-03-21 Thread via GitHub
jpountz commented on PR #14365: URL: https://github.com/apache/lucene/pull/14365#issuecomment-273990 Hurray! - https://benchmarks.mikemccandless.com/TermDayOfYearSort.html - https://benchmarks.mikemccandless.com/TermDTSort.html -- This is an automated message from the Apache Gi

Re: [PR] Implement bulk adding methods for dynamic pruning. [lucene]

2025-03-19 Thread via GitHub
gf2121 merged PR #14365: URL: https://github.com/apache/lucene/pull/14365 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] Implement bulk adding methods for dynamic pruning. [lucene]

2025-03-19 Thread via GitHub
jpountz commented on code in PR #14365: URL: https://github.com/apache/lucene/pull/14365#discussion_r2004331440 ## lucene/core/src/java/org/apache/lucene/util/DocIdSetBuilder.java: ## @@ -47,6 +47,8 @@ public sealed interface BulkAdder permits FixedBitSetAdder, BufferAdder {

Re: [PR] Implement bulk adding methods for dynamic pruning. [lucene]

2025-03-19 Thread via GitHub
gf2121 commented on PR #14365: URL: https://github.com/apache/lucene/pull/14365#issuecomment-2737667396 > I remember playing with calling BulkAdder#grow on the estimated number of matching points (to upgrade to a bitset immediately instead of waiting for docs to be collected) a while back a

Re: [PR] Implement bulk adding methods for dynamic pruning. [lucene]

2025-03-19 Thread via GitHub
jpountz commented on PR #14365: URL: https://github.com/apache/lucene/pull/14365#issuecomment-2736720587 Interesting. I remember playing with calling `BulkAdder#grow` on the estimated number of matching points (to upgrade to a bitset immediately instead of waiting for docs to be collected)

Re: [PR] Implement bulk adding methods for dynamic pruning. [lucene]

2025-03-19 Thread via GitHub
gf2121 commented on PR #14365: URL: https://github.com/apache/lucene/pull/14365#issuecomment-2736218538 I run some benchmarks to find out the major reason: **Baseline**: main branch **Candidate**: collecting docs greater than maxDocVisited into bitset (instead of `DocIdSetBuilder

Re: [PR] Implement bulk adding methods for dynamic pruning. [lucene]

2025-03-18 Thread via GitHub
gf2121 commented on PR #14365: URL: https://github.com/apache/lucene/pull/14365#issuecomment-2735314110 Thanks for running benchmark, the speed up is great! > Skipping these doc IDs looks like it hurts vectorization, I played with disabling these if statements locally and get a good s

Re: [PR] Implement bulk adding methods for dynamic pruning. [lucene]

2025-03-18 Thread via GitHub
jpountz commented on PR #14365: URL: https://github.com/apache/lucene/pull/14365#issuecomment-2734597577 Maybe we should stop only adding doc IDs to the `BulkAdder` if they are greater than the max collected doc so far. Skipping these doc IDs looks like it hurts vectorization, I played with

Re: [PR] Implement bulk adding methods for dynamic pruning. [lucene]

2025-03-18 Thread via GitHub
gf2121 commented on PR #14365: URL: https://github.com/apache/lucene/pull/14365#issuecomment-2733285724 I'm seeing even results on `wikimediumall` ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value

[PR] Implement bulk adding methods for dynamic pruning. [lucene]

2025-03-17 Thread via GitHub
gf2121 opened a new pull request, #14365: URL: https://github.com/apache/lucene/pull/14365 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-ma

Re: [PR] Implement bulk adding methods for dynamic pruning. [lucene]

2025-03-17 Thread via GitHub
jpountz commented on code in PR #14365: URL: https://github.com/apache/lucene/pull/14365#discussion_r1999318407 ## lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java: ## @@ -251,6 +252,30 @@ public void visit(int docID, byte[] packedValue) {