Re: [PR] Introduce new encoding of BPV 21 for DocIdsWriter used in BKD Tree [lucene]

2024-08-27 Thread via GitHub
msfroh commented on PR #13521: URL: https://github.com/apache/lucene/pull/13521#issuecomment-2313906920 I tried modifying the loop to process 4 longs per iteration and noticed no difference on my Xeon host, which is unsurprising since there was no difference between 1 and 3. I also t

Re: [PR] Terminate automaton after matched the whole prefix for PrefixQuery. [lucene]

2024-08-27 Thread via GitHub
github-actions[bot] commented on PR #13072: URL: https://github.com/apache/lucene/pull/13072#issuecomment-2313794696 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Avoid SegmentTermsEnumFrame reload block. [lucene]

2024-08-27 Thread via GitHub
github-actions[bot] commented on PR #13253: URL: https://github.com/apache/lucene/pull/13253#issuecomment-2313794562 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Speed up advancing within a block. [lucene]

2024-08-27 Thread via GitHub
gsmiller commented on code in PR #13692: URL: https://github.com/apache/lucene/pull/13692#discussion_r1733650732 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java: ## @@ -212,13 +213,74 @@ static void prefixSum(long[] buffer, int count, long

Re: [PR] Use range optimizations for "slow" MultiTermQueries when terms happen to be contiguous [lucene]

2024-08-27 Thread via GitHub
gsmiller commented on code in PR #13693: URL: https://github.com/apache/lucene/pull/13693#discussion_r1733631990 ## lucene/core/src/java/org/apache/lucene/document/SortedSetDocValuesRangeQuery.java: ## @@ -161,66 +156,8 @@ public Scorer get(long leadCost) throws IOException {

Re: [PR] Use range optimizations for "slow" MultiTermQueries when terms happen to be contiguous [lucene]

2024-08-27 Thread via GitHub
gsmiller commented on code in PR #13693: URL: https://github.com/apache/lucene/pull/13693#discussion_r1733632247 ## lucene/core/src/java/org/apache/lucene/document/SortedSetDocValuesRangeQuery.java: ## @@ -236,69 +173,4 @@ public boolean isCacheable(LeafReaderContext ctx) {

[PR] Use range optimizations for "slow" MultiTermQueries when terms happen to be contiguous [lucene]

2024-08-27 Thread via GitHub
gsmiller opened a new pull request, #13693: URL: https://github.com/apache/lucene/pull/13693 ### Description This is a small optimization that treats "slow" multi-term queries (e.g., TermInSet, RegexpQuery, etc.) as ordinal ranges when the query terms create a contiguous range. This

Re: [PR] Introduce new encoding of BPV 21 for DocIdsWriter used in BKD Tree [lucene]

2024-08-27 Thread via GitHub
msfroh commented on PR #13521: URL: https://github.com/apache/lucene/pull/13521#issuecomment-2313628836 The approach is pretty neat. I'm wondering if `Bit21With3StepsEncoder` does better on aarch64 because of the explicitly unrolled loop? If so, I'm wondering if unrolling to a multip

Re: [PR] Speed up advancing within a block. [lucene]

2024-08-27 Thread via GitHub
jpountz commented on code in PR #13692: URL: https://github.com/apache/lucene/pull/13692#discussion_r1733487895 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java: ## @@ -509,15 +547,14 @@ private void refillRemainder() throws IOException {

Re: [PR] Speed up advancing within a block. [lucene]

2024-08-27 Thread via GitHub
jpountz commented on code in PR #13692: URL: https://github.com/apache/lucene/pull/13692#discussion_r1733472210 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java: ## @@ -509,15 +547,14 @@ private void refillRemainder() throws IOException {

Re: [PR] Fixed exponent value in explain of SigmoidFunction [lucene]

2024-08-27 Thread via GitHub
owaiskazi19 commented on PR #13691: URL: https://github.com/apache/lucene/pull/13691#issuecomment-231364 > Thank you @owaiskazi19 -- I backported to 9.12 as well. Thanks @mikemccand for the help here. -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Fixed exponent value in explain of SigmoidFunction [lucene]

2024-08-27 Thread via GitHub
mikemccand commented on PR #13691: URL: https://github.com/apache/lucene/pull/13691#issuecomment-2313434759 Thank you @owaiskazi19 -- I backported to 9.12 as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] Fixed exponent value in explain of SigmoidFunction [lucene]

2024-08-27 Thread via GitHub
mikemccand merged PR #13691: URL: https://github.com/apache/lucene/pull/13691 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [I] Find a way to remove IndexSearcher#search(Query query, CollectorOwner collectorOwner) before 10.0 [lucene]

2024-08-27 Thread via GitHub
gsmiller commented on issue #13671: URL: https://github.com/apache/lucene/issues/13671#issuecomment-2313193985 Cool, sounds like we've got a good plan. I'll just add that I've been noodling on the idea for a while of seeing if we could actually get rid of `DrillSideways` and fold what it do

Re: [PR] HNSW BP reorder tool [lucene]

2024-08-27 Thread via GitHub
msokolov commented on PR #13683: URL: https://github.com/apache/lucene/pull/13683#issuecomment-2313146569 I'd like to get this tool merged before moving on to thinking about different index encodings. To that end I plan to ensure that it works with all the vector similarities, and with byte

Re: [I] Measure whether graph is strongly connected [lucene]

2024-08-27 Thread via GitHub
msokolov commented on issue #13687: URL: https://github.com/apache/lucene/issues/13687#issuecomment-2313000464 In graph theory "strongly connected" means every node in the graph can be reached from every other node. In a bidirectional (undirected) graph its the same as "connected". But in a

Re: [I] Find a way to remove IndexSearcher#search(Query query, CollectorOwner collectorOwner) before 10.0 [lucene]

2024-08-27 Thread via GitHub
mikemccand commented on issue #13671: URL: https://github.com/apache/lucene/issues/13671#issuecomment-2312947485 I like Option 1, first, as well. We can later explore flavors of 2 and 3. Thanks @epotyom and @gsmiller. -- This is an automated message from the Apache Git Service. To respo

Re: [I] Find a way to remove IndexSearcher#search(Query query, CollectorOwner collectorOwner) before 10.0 [lucene]

2024-08-27 Thread via GitHub
epotyom commented on issue #13671: URL: https://github.com/apache/lucene/issues/13671#issuecomment-2312932669 @gsmiller , I agree. I'll create a PR for the 1st option then in the next couple of days. Thanks! -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] Speed up advancing within a block. [lucene]

2024-08-27 Thread via GitHub
mikemccand commented on code in PR #13692: URL: https://github.com/apache/lucene/pull/13692#discussion_r1733130032 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java: ## @@ -509,15 +547,14 @@ private void refillRemainder() throws IOException

Re: [I] Measure whether graph is strongly connected [lucene]

2024-08-27 Thread via GitHub
pierwill commented on issue #13687: URL: https://github.com/apache/lucene/issues/13687#issuecomment-2312910112 > But it is still an open question the extent to which this kind of check could potentially improve graph connectivity and thus recall I only understand the broad ideas here.

Re: [PR] Speed up advancing within a block. [lucene]

2024-08-27 Thread via GitHub
jpountz commented on code in PR #13692: URL: https://github.com/apache/lucene/pull/13692#discussion_r1733116688 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java: ## @@ -509,15 +547,14 @@ private void refillRemainder() throws IOException {

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2024-08-27 Thread via GitHub
mikemccand commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2312879477 I also aliased (CNAMEd) [benchmarks.mikemccandless.com](https://benchmarks.mikemccandless.com/) -- GitHub pages makes this simple-ish, yay. -- This is an automated message fro

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2024-08-27 Thread via GitHub
mikemccand commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2312800530 > > FYI: I clicked on a few random links and found a 404 https://mikemccand.github.io/luceneutil/analyzers.html although this page does seem to exist on the current site >

Re: [I] Find a way to remove IndexSearcher#search(Query query, CollectorOwner collectorOwner) before 10.0 [lucene]

2024-08-27 Thread via GitHub
gsmiller commented on issue #13671: URL: https://github.com/apache/lucene/issues/13671#issuecomment-2312763368 Thanks @epotyom for the detailed thoughts! If I'm understanding this correctly, it sounds like "option 1" would let us remove the need for `CollectorOwner` along with the new API i

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2024-08-27 Thread via GitHub
mikemccand commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2312740634 Phew, OK, I think nightly benchy is now successfully publishing automatically to https://mikemccand.github.io/lucenenightly (using GitHub pages). Last night's run "just worked".

Re: [PR] Speed up advancing within a block. [lucene]

2024-08-27 Thread via GitHub
jpountz commented on code in PR #13692: URL: https://github.com/apache/lucene/pull/13692#discussion_r1732939828 ## lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/AdvanceBenchmark.java: ## @@ -0,0 +1,376 @@ +/* + * Licensed to the Apache Software Foundation (ASF) u

Re: [PR] Speed up advancing within a block. [lucene]

2024-08-27 Thread via GitHub
madrob commented on code in PR #13692: URL: https://github.com/apache/lucene/pull/13692#discussion_r1732934065 ## lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/AdvanceBenchmark.java: ## @@ -0,0 +1,376 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

Re: [PR] Speed up advancing within a block. [lucene]

2024-08-27 Thread via GitHub
jpountz commented on PR #13692: URL: https://github.com/apache/lucene/pull/13692#issuecomment-2312622096 > This is actually quite an impressive speedup! To be honest, I believe that this is the only run where I got more than 10%, other runs were in the 5%-10% range for `CountAndHighHi

Re: [PR] Speed up advancing within a block. [lucene]

2024-08-27 Thread via GitHub
mikemccand commented on PR #13692: URL: https://github.com/apache/lucene/pull/13692#issuecomment-2312618083 > CountAndHighHigh 48.56 (1.5%) 53.62 (1.0%) 10.4% ( 7% - 13%) 0.000 This is actually quite an impressive speedup! -- This is an automated message

Re: [PR] Speed up advancing within a block. [lucene]

2024-08-27 Thread via GitHub
jpountz commented on code in PR #13692: URL: https://github.com/apache/lucene/pull/13692#discussion_r1732885915 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java: ## @@ -509,15 +547,14 @@ private void refillRemainder() throws IOException {

Re: [PR] Speed up advancing within a block. [lucene]

2024-08-27 Thread via GitHub
mikemccand commented on code in PR #13692: URL: https://github.com/apache/lucene/pull/13692#discussion_r1732819218 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java: ## @@ -509,15 +547,14 @@ private void refillRemainder() throws IOException

Re: [PR] Add support for intra-segment search concurrency [lucene]

2024-08-27 Thread via GitHub
javanna commented on code in PR #13542: URL: https://github.com/apache/lucene/pull/13542#discussion_r1732759669 ## lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java: ## @@ -890,11 +945,70 @@ public static class LeafSlice { * * @lucene.experimental

Re: [PR] Speed up advancing within a block. [lucene]

2024-08-27 Thread via GitHub
jpountz commented on PR #13692: URL: https://github.com/apache/lucene/pull/13692#issuecomment-2311845595 Here is what the `AdvanceBenchmark` reports. The branchless binary search is `binarySearch5`, which performs much faster than a regular binary search but still slower than a linear searc

[PR] Speed up advancing within a block. [lucene]

2024-08-27 Thread via GitHub
jpountz opened a new pull request, #13692: URL: https://github.com/apache/lucene/pull/13692 Advancing within a block consists of finding the first index within an array of 128 values whose value is greater than or equal a target. Given the small size, it's not obvious whether it's better to

[PR] Fixed exponent in explain of SigmoidFunction [lucene]

2024-08-27 Thread via GitHub
owaiskazi19 opened a new pull request, #13691: URL: https://github.com/apache/lucene/pull/13691 ### Description Coming from https://github.com/opensearch-project/OpenSearch/issues/14921. This PR fixes the correct value of exponent from pivot in the Sigmoid function. -- This is