Re: [I] FSTCompiler's NodeHash should fully duplicate `byte[]` slices from the growing FST [lucene]

2023-10-31 Thread via GitHub
dungba88 commented on issue #12714: URL: https://github.com/apache/lucene/issues/12714#issuecomment-1788492690 An updated statistics based on the new implementation. RAM Limit | Actual cache size | Cached nodes | FST size ---|---|---|--- No limit | 1.4MB | 62457 | 977KB 1MB |

[PR] speedup arm int functions? [lucene]

2023-10-31 Thread via GitHub
rmuir opened a new pull request, #12743: URL: https://github.com/apache/lucene/pull/12743 I was looking at this curious little guy which seems close to what we want? https://github.com/openjdk/jdk/blob/b3fec6b5f32c338ae1a84dd20bdcbd3d9b7186f3/src/hotspot/cpu/aarch64/aarch64_vector_ad.m4#L137

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-10-31 Thread via GitHub
dungba88 commented on PR #12738: URL: https://github.com/apache/lucene/pull/12738#issuecomment-1788406285 Ok, it's ready for review. I'll add the CHANGES.txt entry once it's approved. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-10-31 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1378381108 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -186,119 +194,85 @@ private long hash(FSTCompiler.UnCompiledNode node) { return h; }

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-31 Thread via GitHub
rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1788347778 Uwe i realize i may be short with my responses, I ask: 0. get your espresso before continuing. 1. please take the time to look at benchmark results in depth. **make** sure you see the

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-31 Thread via GitHub
rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1788323948 Given this PR has no regressions for anyone and only causes speedups, I will pause it for a few weeks. please take the time to read and digest the results. Uwe, your FMA is slow too

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-31 Thread via GitHub
rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1788320396 also @uschindler i think you read your own results wrong, you can see the regression with FMA in your case too, same as @dweiss: ``` VectorUtilBenchmark.floatDotProductScalar1

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-10-31 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1378285803 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -269,36 +283,58 @@ private boolean nodesEqual(FSTCompiler.UnCompiledNode node, long address)

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-10-31 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1378285803 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -269,36 +283,58 @@ private boolean nodesEqual(FSTCompiler.UnCompiledNode node, long address)

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-10-31 Thread via GitHub
dungba88 commented on PR #12738: URL: https://github.com/apache/lucene/pull/12738#issuecomment-1788188467 I solved most of the nocommits (only 1 left). But I ended up using a `List` where each item is a node instead of ByteBlockPool due to the following reasons: - With ByteBlockPool we h

Re: [PR] Remove patching for doc blocks. [lucene]

2023-10-31 Thread via GitHub
slow-J commented on code in PR #12741: URL: https://github.com/apache/lucene/pull/12741#discussion_r1378259917 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/ForDeltaUtil.java: ## @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or m

Re: [PR] Remove patching for doc blocks. [lucene]

2023-10-31 Thread via GitHub
slow-J commented on code in PR #12741: URL: https://github.com/apache/lucene/pull/12741#discussion_r1378259917 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/ForDeltaUtil.java: ## @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or m

Re: [I] gradle's incremental compilation can get confused and lead to odd build errors [lucene]

2023-10-31 Thread via GitHub
dweiss commented on issue #12742: URL: https://github.com/apache/lucene/issues/12742#issuecomment-1788129507 I think the reason for an up-to-date status is because we handle module path ourselves ``` // Add modular dependencies and their transitive dependencies to module path.

Re: [I] gradle's incremental compilation can get confused and lead to odd build errors [lucene]

2023-10-31 Thread via GitHub
dweiss commented on issue #12742: URL: https://github.com/apache/lucene/issues/12742#issuecomment-1788119452 It's not incremental compilation - it's something that is related to task inputs and (likely) modular dependencies. Seems like we're missing something that makes the test compilation

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-31 Thread via GitHub
uschindler commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1788100282 Let's remove FMA support in this pull request. I will add a follow-up PR to have a PKG private utility class to query VM options that also logs a warning consistent (like in ram

Re: [I] byte to int in TruncateTokenFilterFactory to TruncateTokenFilter [lucene]

2023-10-31 Thread via GitHub
robro612 commented on issue #12449: URL: https://github.com/apache/lucene/issues/12449#issuecomment-1788067055 Hi @asubbu90 My name is Rohan Jha, I'm a Masters student at UT Austin taking a graduate [Distributed Systems course](https://github.com/vijay03/cs380d-f23). As part of my c

Re: [I] Take advantage of bloom filter when delete terms [lucene]

2023-10-31 Thread via GitHub
robro612 commented on issue #12725: URL: https://github.com/apache/lucene/issues/12725#issuecomment-1788065989 Hi @gf2121 , My name is Rohan Jha, I'm a Masters student at UT Austin taking a graduate [Distributed Systems course](https://github.com/vijay03/cs380d-f23). As part of my co

[I] gradle's incremental compilation can get confused and lead to odd build errors [lucene]

2023-10-31 Thread via GitHub
dweiss opened a new issue, #12742: URL: https://github.com/apache/lucene/issues/12742 ### Description Recent builds failed with: ``` org.apache.lucene.search.uhighlight.TestLengthGoalBreakIterator > testTargetLen FAILED java.lang.NoSuchMethodError: 'org.apache.lucene.util

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-31 Thread via GitHub
rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1788017728 good point about the C2 detection! If the bigdecimal code gets used it is a real problem. I hit this case on accident while developing this PR and suddenly saw JMH outputting scientific no

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-31 Thread via GitHub
rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1787992450 > My AMD, which is a server CPU (in contrast to Threadripper, which is a workstation CPU), works faster with FMA. How can I enable it with your patch? You disable it without any chance to

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-31 Thread via GitHub
uschindler commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1787985763 Anyways there's something problematic: FMA is slow with C2 disabled. In Panama Vector mode we detect this earlier and fallback to scalar variant. But with this patch the scalar

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-31 Thread via GitHub
uschindler commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1787973537 Hi, > So I'd like to avoid use of FMA on AMD, where some cpus it causes slowdowns compared to mul+add which allows more parallelism due to architecture. If somebody notic

Re: [PR] Remove patching for doc blocks. [lucene]

2023-10-31 Thread via GitHub
slow-J commented on PR #12741: URL: https://github.com/apache/lucene/pull/12741#issuecomment-1787958483 Thanks for the suggestion, I added `Lucene90RWPostingsFormat` in latest commit and made `Lucene90PostingsFormat` read-only. -- This is an automated message from the Apache Git Service.

Re: [PR] Remove patching for doc blocks. [lucene]

2023-10-31 Thread via GitHub
slow-J commented on code in PR #12741: URL: https://github.com/apache/lucene/pull/12741#discussion_r1378112509 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/PForUtil.java: ## @@ -0,0 +1,558 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] Remove patching for doc blocks. [lucene]

2023-10-31 Thread via GitHub
slow-J commented on code in PR #12741: URL: https://github.com/apache/lucene/pull/12741#discussion_r1378112295 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/ForDeltaUtil.java: ## @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or m

Re: [PR] Remove patching for doc blocks. [lucene]

2023-10-31 Thread via GitHub
slow-J commented on code in PR #12741: URL: https://github.com/apache/lucene/pull/12741#discussion_r1378112072 ## lucene/CHANGES.txt: ## @@ -104,6 +104,8 @@ Optimizations * GITHUB#12552: Make FSTPostingsFormat load FSTs off-heap. (Tony X) +* GITHUB#12696: Change Postings ba

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-31 Thread via GitHub
rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1787943673 I don't want to commit changes that slow down some people's computers. This is a way to go about it that is safe and sane. I view this one the same as the intel downclocking issue (where

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-31 Thread via GitHub
uschindler commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1787795291 To prevent any surprises, I backported the branch and also the benchmark module to 9.x and ran everything with Java 11.0.13 (on my Intel Laptop): ``` branch_9x (Java 11.0.13)

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-31 Thread via GitHub
uschindler commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1787740992 Hi, on my older Ryzen it is faster with FMA enabled (I downgraded your branch and also verified that the system prints "FMA enabled". Here is my full benchmark: ``` main,

Re: [PR] Specialize the 2nd clause of conjunctions. [lucene]

2023-10-31 Thread via GitHub
jpountz merged PR #12713: URL: https://github.com/apache/lucene/pull/12713 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [I] Use max BPV encoding in postings if doc buffer size less than ForUtil.BLOCK_SIZE [lucene]

2023-10-31 Thread via GitHub
jpountz commented on issue #12717: URL: https://github.com/apache/lucene/issues/12717#issuecomment-1787646282 Since we're changing the postings format anyway in https://github.com/apache/lucene/pull/12741, I wonder if it would be worth looking into different encodings for these tail posting

Re: [PR] Remove patching for doc blocks. [lucene]

2023-10-31 Thread via GitHub
jpountz commented on code in PR #12741: URL: https://github.com/apache/lucene/pull/12741#discussion_r1377911285 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/ForDeltaUtil.java: ## @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-31 Thread via GitHub
slow-J commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1787521225 Thanks all for the feedback. Will proceed with removing patching only for doc blocks (reverting some of https://github.com/apache/lucene/pull/69) All the changes needed to crea

[PR] Remove patching for doc blocks. [lucene]

2023-10-31 Thread via GitHub
slow-J opened a new pull request, #12741: URL: https://github.com/apache/lucene/pull/12741 We are still keeping PFOR for positions only. This is a partial revert of https://github.com/apache/lucene/pull/69 which brings back ForDeltaUtil. Closes #12696 -- This is an autom

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-31 Thread via GitHub
rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1787491191 thanks for benchmarking everyone, really helpful. I'm glad @dweiss caught the AMD issue. -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-10-31 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r138426 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -214,7 +222,13 @@ private long hash(long node) throws IOException { * Compares an unfrozen

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-31 Thread via GitHub
ChrisHegarty commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1787445189 > > I see the scalar code vectorize, but not optimally for the target CPU - e.g. `vfmadd231ss %xmm9,%xmm10,%xmm4` on my Rocket Lake. Where as, the vector API compilation emits instr

Re: [PR] Fix NullPointerException in Monitor.getQuery when query is not present [lucene]

2023-10-31 Thread via GitHub
romseygeek commented on PR #12736: URL: https://github.com/apache/lucene/pull/12736#issuecomment-1787418950 Thanks @daviscook477! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[I] Better wire up HNSW concurrent merge config [lucene]

2023-10-31 Thread via GitHub
zhaih opened a new issue, #12740: URL: https://github.com/apache/lucene/issues/12740 ### Description Follow up of #12660, currently the HNSW concurrent merge need 2 parameters to make it work: `numMergeWorker` (num of thread spawn per merge) and an `ExecutorService` that is used to e

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-31 Thread via GitHub
rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1787409152 > I see the scalar code vectorize, but not optimally for the target CPU - e.g. `vfmadd231ss %xmm9,%xmm10,%xmm4` on my Rocket Lake. Where as, the vector API compilation emits instructions t

Re: [PR] Fix NullPointerException in Monitor.getQuery when query is not present [lucene]

2023-10-31 Thread via GitHub
romseygeek merged PR #12736: URL: https://github.com/apache/lucene/pull/12736 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-31 Thread via GitHub
ChrisHegarty commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1787403081 > I tried naively writing the logic like this with a couple N (8, 16, 32,etc) with FMA both off and on to see if I can baby this compiler to vectorize, nope, nothing. I don't think

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-31 Thread via GitHub
zhaih commented on PR #12660: URL: https://github.com/apache/lucene/pull/12660#issuecomment-1787383067 > For my understanding, how does it play out with merge throttling, could we end up sleeping within tasks that get passed to the configured executor service? I think we rate limit th

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-10-31 Thread via GitHub
mikemccand commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1377721562 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -269,36 +283,58 @@ private boolean nodesEqual(FSTCompiler.UnCompiledNode node, long address

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-31 Thread via GitHub
mikemccand commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1787353298 OK I ran again this command: `./gradlew -p lucene/benchmark-jmh assemble && java -jar lucene/benchmark-jmh/build/benchmarks/lucene-benchmark-jmh-10.0.0-SNAPSHOT.jar float -p size=102

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-31 Thread via GitHub
mikemccand commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1787346877 > Why is vector and scalar identical? Cause it is java 17 or am I missing something. Sorry, yeah, I ran with Java 17. I'm running again with Java 20 now ... > Please ad

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-31 Thread via GitHub
uschindler commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1787335532 > Thanks @rmuir! > > Results from the nightly benchy box (128 core Ryzen Threadripper 3990X): > > `main`: > > ``` > Benchmark (s

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-31 Thread via GitHub
ChrisHegarty commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1787326935 Just some more numbers, for sanity. Intel Rocket Lake: main: ``` VectorUtilBenchmark.floatCosineScalar1024 thrpt5 0.844 ± 0.007 ops/us VectorUti

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-31 Thread via GitHub
mikemccand commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1787292187 Thanks @rmuir! Results from the nightly benchy box (128 core Ryzen Threadripper 3990X): `main`: ``` Benchmark (size) Mode Cnt

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-31 Thread via GitHub
uschindler commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1787287133 > This should solve @dweiss problem: [f2be84f](https://github.com/apache/lucene/commit/f2be84f822b6c2c7971fa1d434ad82c09174039f) > > It should also improve speed of vectorized c

Re: [PR] Fix NullPointerException in Monitor.getQuery when query is not present [lucene]

2023-10-31 Thread via GitHub
daviscook477 commented on PR #12736: URL: https://github.com/apache/lucene/pull/12736#issuecomment-1787229329 @romseygeek I have added a CHANGES.txt entry under the 9.9.0 release describing the bugfix in this PR. -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-31 Thread via GitHub
rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1787195593 This should solve @dweiss problem: https://github.com/apache/lucene/pull/12737/commits/f2be84f822b6c2c7971fa1d434ad82c09174039f It should also improve speed of vectorized case on AMD

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-31 Thread via GitHub
uschindler commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1787190466 Please keep FMA for now and allow us benchmarking. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [I] Explore within-block skipping for postings [lucene]

2023-10-31 Thread via GitHub
jpountz commented on issue #12486: URL: https://github.com/apache/lucene/issues/12486#issuecomment-1787160140 > Lucene is also doing the "accumulate docid deltas into the absolute docid" too in this loop, but I guess Tantivy does this separately somehow? I believe Tantivy does the sam

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-31 Thread via GitHub
rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1787121130 we can also detect newer zen (e.g. zen4) by it having 512-bit vectors. so we just need to detect AMD vs Intel. anyway, i'm fine with removing FMA from this PR completely actually, an

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-31 Thread via GitHub
rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1787102205 > You can't get zen version or detect Ryzen CPUs easily. Please do not add cpuinfo parsing as this won't work on alternate platforms like Windows. We can just retrieve another flag t

Re: [PR] Compute multiple float aggregations in one go [lucene]

2023-10-31 Thread via GitHub
stefanvodita commented on code in PR #12547: URL: https://github.com/apache/lucene/pull/12547#discussion_r1377483050 ## lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FloatTaxonomyFacets.java: ## @@ -37,33 +37,43 @@ abstract class FloatTaxonomyFacets extends TaxonomyFace

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-31 Thread via GitHub
uschindler commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1787077812 I will check on my Zen on policeman Jenkins later. You can't get zen version or detect Ryzen CPUs easily. Please do not add cpuinfo parsing as this won't work on alternate platf

Re: [PR] Compute multiple float aggregations in one go [lucene]

2023-10-31 Thread via GitHub
mikemccand commented on code in PR #12547: URL: https://github.com/apache/lucene/pull/12547#discussion_r1377455225 ## lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FloatTaxonomyFacets.java: ## @@ -37,33 +37,43 @@ abstract class FloatTaxonomyFacets extends TaxonomyFacets

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-31 Thread via GitHub
rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1787047135 hmm i think i read @dweiss results in the wrong order... it seems like a fairly big difference? we actually regress scalar dotproduct for his cpu here. And I assume the vector case behaves

Re: [PR] Improve hash mixing in FST's double-barrel LRU hash [lucene]

2023-10-31 Thread via GitHub
mikemccand commented on PR #12716: URL: https://github.com/apache/lucene/pull/12716#issuecomment-1787033818 Wow, these are interesting results. Thanks @shubhamvishu. It's curious that, at 3/4 rehash ratio, linear probing is quite a bit slower (103.4 s) than quadratic (98.1 s). The

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-31 Thread via GitHub
uschindler commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1787029825 Yeah. Please keep it as it gives more correct results. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-31 Thread via GitHub
rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1787025516 > Nice. I checked with AMD Ryzen Threadripper 3970X. Note it's actually slightly faster when not using FMA... @dweiss thanks for testing, I expect this on AMD. but "slightly" is very

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-10-31 Thread via GitHub
dweiss commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1377416721 ## lucene/core/src/java/org/apache/lucene/util/fst/ReverseBytesReader.java: ## @@ -17,7 +17,7 @@ package org.apache.lucene.util.fst; /** Reads in reverse from a sin

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-31 Thread via GitHub
jpountz commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1786969801 > Normally the IntNRQ (1D points numeric range query) is very noisy, but maybe this gain is real? p-value seems to think it could be close to real? I'm not sure how it could n

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-10-31 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1377383960 ## lucene/core/src/java/org/apache/lucene/util/fst/ReverseBytesReader.java: ## @@ -17,7 +17,7 @@ package org.apache.lucene.util.fst; /** Reads in reverse from a s

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-31 Thread via GitHub
mikemccand commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1786949852 Thanks for testing @jpountz. I think at some point we also enabled patching for the freq blocks inside `.doc` file too? Normally the `IntNRQ` (1D points numeric rang

Re: [PR] Fix file handle leak in Lucene99ScalarQuantizedVectorsWriter. [lucene]

2023-10-31 Thread via GitHub
jpountz merged PR #12739: URL: https://github.com/apache/lucene/pull/12739 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-10-31 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1377364142 ## lucene/core/src/java/org/apache/lucene/util/fst/ReverseBytesReader.java: ## @@ -17,7 +17,7 @@ package org.apache.lucene.util.fst; /** Reads in reverse from a s

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-10-31 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1377360860 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -269,36 +283,58 @@ private boolean nodesEqual(FSTCompiler.UnCompiledNode node, long address)

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-10-31 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1377358613 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -269,36 +283,58 @@ private boolean nodesEqual(FSTCompiler.UnCompiledNode node, long address)

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-10-31 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1377355083 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -269,36 +283,58 @@ private boolean nodesEqual(FSTCompiler.UnCompiledNode node, long address)

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-10-31 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1377351934 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -749,7 +750,6 @@ public void add(IntsRef input, T output) throws IOException { // f

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-10-31 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1377348991 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -461,9 +457,14 @@ long addNode(FSTCompiler.UnCompiledNode nodeIn) throws IOException {

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-10-31 Thread via GitHub
mikemccand commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1377320240 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -461,9 +457,14 @@ long addNode(FSTCompiler.UnCompiledNode nodeIn) throws IOException {

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-31 Thread via GitHub
uschindler commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1786880847 Thanks, great. By the way if you run the incubator impl it prints the FMA status as log message. Just the scalar one doesn't. -- This is an automated message from the Apache Git Ser

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-31 Thread via GitHub
dweiss commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1786855076 Here's how JMH copies runner VM's input arguments - ``` jvmArgs.addAll(options.getJvmArgs() .orElseGet(() -> benchmark.getJvmArgs() .orEl

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-31 Thread via GitHub
dweiss commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1786839902 Out of curiosity I checked (by sysouting the status of the constant): -XX:+-UseFMA is passed to forked JVMs, so these incantations of jmh are equivalent: ``` java -XX:-UseFMA -jar

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-31 Thread via GitHub
jpountz commented on PR #12660: URL: https://github.com/apache/lucene/pull/12660#issuecomment-1786816049 For my understanding, how does it play out with merge throttling, could we end up sleeping within tasks that get passed to the configured executor service? -- This is an automated mess

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-31 Thread via GitHub
dweiss commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1786815134 JMH states that: ``` -jvmArgsUse given JVM arguments. Most options are inherited from the host VM options, [...] ``` and I've alw

[PR] Fix file handle leak in Lucene99ScalarQuantizedVectorsWriter. [lucene]

2023-10-31 Thread via GitHub
jpountz opened a new pull request, #12739: URL: https://github.com/apache/lucene/pull/12739 If `mergeQuantizedByteVectorValues` fails with an exception, the temp output never gets closed. This was found by the test that throws random exceptions. -- This is an automated message from the Ap

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-31 Thread via GitHub
uschindler commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1786790886 @dweiss Are you sure the `-useFMA` was applied to the child jvms of benchmark. As far as I remember you have to pass the JVM options as separate parameter to be applied to childs. -

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-31 Thread via GitHub
dweiss commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1786776417 Nice. I checked with AMD Ryzen Threadripper 3970X. Note it's actually slightly faster when not using FMA... ``` JDK 17, patch java -jar lucene\benchmark-jmh\build\benchmarks\

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-31 Thread via GitHub
uschindler commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1786711148 Did you aldo test this PR with java 17? Your benchmark does not say which version it uses. -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-31 Thread via GitHub
uschindler commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1786709832 One thing: we should test this in lucebe 9.x with java 11, too. I want to make sure we get same or similar speedup there. Not that there will be a regression. -- This is an automat

[PR] Use value-based LRU cache in NodeHash [lucene]

2023-10-31 Thread via GitHub
dungba88 opened a new pull request, #12738: URL: https://github.com/apache/lucene/pull/12738 ### Description Fix #12714 First attempt to introduce value-based LRU cache in NodeHash. There are some inefficiencies, but the functionalities work. -- This is an automated message

Re: [I] FSTCompiler's NodeHash should fully duplicate `byte[]` slices from the growing FST [lucene]

2023-10-31 Thread via GitHub
dungba88 commented on issue #12714: URL: https://github.com/apache/lucene/issues/12714#issuecomment-1786608097 I think that makes sense. I attempted to implement the copy bytes (not optimizing though, and there are lots of non-optimal bytes read/write). With the same FST as abo