Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-04-05 Thread via GitHub
jpountz merged PR #14273: URL: https://github.com/apache/lucene/pull/14273 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-03-28 Thread via GitHub
gsmiller commented on code in PR #14273: URL: https://github.com/apache/lucene/pull/14273#discussion_r2018769975 ## lucene/core/src/java/org/apache/lucene/util/FixedBitSet.java: ## @@ -204,6 +205,40 @@ public int cardinality() { return Math.toIntExact(tot); } + /** +

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-03-28 Thread via GitHub
jpountz commented on code in PR #14273: URL: https://github.com/apache/lucene/pull/14273#discussion_r2018661757 ## lucene/core/src/java/org/apache/lucene/util/FixedBitSet.java: ## @@ -204,6 +205,40 @@ public int cardinality() { return Math.toIntExact(tot); } + /** +

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-03-27 Thread via GitHub
gsmiller commented on code in PR #14273: URL: https://github.com/apache/lucene/pull/14273#discussion_r2017767190 ## lucene/core/src/java/org/apache/lucene/search/DISIDocIdStream.java: ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-03-27 Thread via GitHub
gsmiller commented on code in PR #14273: URL: https://github.com/apache/lucene/pull/14273#discussion_r2017721363 ## lucene/core/src/java/org/apache/lucene/util/FixedBitSet.java: ## @@ -204,6 +205,40 @@ public int cardinality() { return Math.toIntExact(tot); } + /** +

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-03-27 Thread via GitHub
jpountz commented on code in PR #14273: URL: https://github.com/apache/lucene/pull/14273#discussion_r2017547274 ## lucene/core/src/java/org/apache/lucene/search/BitSetDocIdStream.java: ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-03-27 Thread via GitHub
jpountz commented on code in PR #14273: URL: https://github.com/apache/lucene/pull/14273#discussion_r2017556183 ## lucene/core/src/java/org/apache/lucene/search/BitSetDocIdStream.java: ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-03-27 Thread via GitHub
jpountz commented on code in PR #14273: URL: https://github.com/apache/lucene/pull/14273#discussion_r2017552582 ## lucene/core/src/java/org/apache/lucene/search/BooleanScorer.java: ## @@ -207,8 +164,32 @@ private void scoreWindowIntoBitSetAndReplay( acceptDocs.applyMask(m

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-03-27 Thread via GitHub
jpountz commented on code in PR #14273: URL: https://github.com/apache/lucene/pull/14273#discussion_r2017546602 ## lucene/core/src/java/org/apache/lucene/search/DISIDocIdStream.java: ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more +

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-03-27 Thread via GitHub
jpountz commented on code in PR #14273: URL: https://github.com/apache/lucene/pull/14273#discussion_r2017544601 ## lucene/core/src/java/org/apache/lucene/search/DocIdStream.java: ## @@ -34,12 +33,34 @@ protected DocIdStream() {} * Iterate over doc IDs contained in this strea

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-03-27 Thread via GitHub
gsmiller commented on code in PR #14273: URL: https://github.com/apache/lucene/pull/14273#discussion_r2017239514 ## lucene/core/src/java/org/apache/lucene/search/DISIDocIdStream.java: ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-03-27 Thread via GitHub
jpountz commented on PR #14273: URL: https://github.com/apache/lucene/pull/14273#issuecomment-2758283399 I played with the geonames dataset, by filtering out docs that don't have a value for the `elevation` field (2.3M docs left), enabling index sorting on the `elevation` field and computin

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-03-26 Thread via GitHub
jpountz commented on PR #14273: URL: https://github.com/apache/lucene/pull/14273#issuecomment-2755991200 I'll try to run some simple benchmarks next. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-03-26 Thread via GitHub
jpountz commented on PR #14273: URL: https://github.com/apache/lucene/pull/14273#issuecomment-2755985826 It should be ready for review now. Now that `DocIdStream` has become more sophisticated, I extracted impls to proper classes that could be better tested. This causes some diffs in our bo

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-03-26 Thread via GitHub
jpountz commented on code in PR #14273: URL: https://github.com/apache/lucene/pull/14273#discussion_r2014552652 ## lucene/core/src/java/org/apache/lucene/search/DocIdStream.java: ## @@ -34,12 +33,35 @@ protected DocIdStream() {} * Iterate over doc IDs contained in this strea

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-03-26 Thread via GitHub
jpountz commented on PR #14273: URL: https://github.com/apache/lucene/pull/14273#issuecomment-2754999433 > If we have a skipper, I think we ought to also be able to use competitive iterators to jump over blocks of docs we know we won't collect based on their values? This is correct.

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-03-26 Thread via GitHub
gsmiller commented on PR #14273: URL: https://github.com/apache/lucene/pull/14273#issuecomment-2754588145 > I like the idea! Looks like we can do similar trick for range facets and long values facets? I _think_ we could optimize these use-cases even further by potentially skipping ov

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-03-26 Thread via GitHub
gsmiller commented on code in PR #14273: URL: https://github.com/apache/lucene/pull/14273#discussion_r2014273684 ## lucene/core/src/java/org/apache/lucene/search/DocIdStream.java: ## @@ -34,12 +33,35 @@ protected DocIdStream() {} * Iterate over doc IDs contained in this stre

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-03-25 Thread via GitHub
jpountz commented on PR #14273: URL: https://github.com/apache/lucene/pull/14273#issuecomment-2752610585 Quick update: we now have more queries that collect hits using `collect(DocIdStream)`, which makes this optimization more appealing. -- This is an automated message from the Apache Git

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-03-25 Thread via GitHub
gsmiller commented on PR #14273: URL: https://github.com/apache/lucene/pull/14273#issuecomment-2751652120 +1 to this optimization. Love the idea! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-02-24 Thread via GitHub
jpountz commented on PR #14273: URL: https://github.com/apache/lucene/pull/14273#issuecomment-2678921217 > Looks like we can do similar trick for range facets and long values facets? This is right. -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-02-24 Thread via GitHub
epotyom commented on code in PR #14273: URL: https://github.com/apache/lucene/pull/14273#discussion_r1967691866 ## lucene/core/src/java/org/apache/lucene/search/DocIdStream.java: ## @@ -34,12 +33,35 @@ protected DocIdStream() {} * Iterate over doc IDs contained in this strea

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-02-21 Thread via GitHub
jpountz commented on PR #14273: URL: https://github.com/apache/lucene/pull/14273#issuecomment-2674888106 @epotyom You may be interested in taking a look. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-02-21 Thread via GitHub
jpountz opened a new pull request, #14273: URL: https://github.com/apache/lucene/pull/14273 This attempts to generalize the `IndexSearcher#count` optimization from PR #12415 to histogram facets by introducing specialization for counting the number of matching docs in a range of doc IDs.