Re: [PR] Make FST fully read-only and streamline FST constructor [lucene]

2023-11-04 Thread via GitHub
dungba88 closed pull request #12691: Make FST fully read-only and streamline FST constructor URL: https://github.com/apache/lucene/pull/12691 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[PR] Streamline FST constructors and make it fully read-only [lucene]

2023-11-04 Thread via GitHub
dungba88 opened a new pull request, #12758: URL: https://github.com/apache/lucene/pull/12758 ### Description - Streamline FST constructors by grouping the medata attributes into FSTMetadata - Make it fully read-only by moving `finish()` and `setEmptyOutput` to FSTCompiler E

Re: [PR] Streamline FST constructors and make it fully read-only [lucene]

2023-11-04 Thread via GitHub
dungba88 commented on code in PR #12758: URL: https://github.com/apache/lucene/pull/12758#discussion_r1382359394 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -411,17 +401,42 @@ public FST(DataInput metaIn, DataInput in, Outputs outputs) throws IOExceptio

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-11-04 Thread via GitHub
mikemccand commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1793399807 Thank you @rmuir for doing all the crazy hard work to decode the actual capabilities of the bare metal hiding underneath the layers of abstraction under Panama Vector API @rmuir! I l

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-11-04 Thread via GitHub
uschindler commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1793421197 @rmuir: It would be nice if you could merge this long PR with Github UI and squash it - thanks. I can do it for you if you like. -- This is an automated message from the Apache Git

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-11-04 Thread via GitHub
mikemccand commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1793421290 I tested on my now-ancient Zen2 beast3 (nightly benchmark) box (`AMD Ryzen Threadripper 3990X 64-Core Processor`), using JDK 21 (`openjdk full version "21+35"`), with command-line `./

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-04 Thread via GitHub
mikemccand commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1382378735 ## lucene/core/src/java/org/apache/lucene/util/fst/ByteBlockPoolReverseBytesReader.java: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[I] Remove `FST.BytesReader#reversed` method? [lucene]

2023-11-04 Thread via GitHub
mikemccand opened a new issue, #12759: URL: https://github.com/apache/lucene/issues/12759 ### Description Spinoff from #12738: this method seems to be dead/pointless code now? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-04 Thread via GitHub
mikemccand commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1382379017 ## lucene/core/src/java/org/apache/lucene/util/ByteBlockPool.java: ## @@ -38,6 +38,8 @@ public final class ByteBlockPool implements Accountable { /** Abstract

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-04 Thread via GitHub
mikemccand commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1382379237 ## lucene/core/src/test/org/apache/lucene/util/TestByteBlockPool.java: ## @@ -91,6 +92,10 @@ public void testLargeRandomBlocks() throws IOException { random(

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-11-04 Thread via GitHub
uschindler commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1793426843 > I tested on my now-ancient Zen2 beast3 (nightly benchmark) box (`AMD Ryzen Threadripper 3990X 64-Core Processor`), using JDK 21 (`openjdk full version "21+35"`), with command-line `

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-11-04 Thread via GitHub
rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1793426911 > @rmuir: It would be nice if you could follow the community standard and merge this long PR with Github UI and squash it - thanks. I can do it for you if you like. I am not done he

Re: [PR] Streamline FST constructors and make it fully read-only [lucene]

2023-11-04 Thread via GitHub
dungba88 commented on code in PR #12758: URL: https://github.com/apache/lucene/pull/12758#discussion_r1382383218 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -411,17 +401,42 @@ public FST(DataInput metaIn, DataInput in, Outputs outputs) throws IOExceptio

[I] Improve bytes copy in NodeHash [lucene]

2023-11-04 Thread via GitHub
dungba88 opened a new issue, #12760: URL: https://github.com/apache/lucene/issues/12760 ### Description Spawn of https://github.com/apache/lucene/pull/12738, there are 2 TODOs about reducing byte copies when copying from FST and when promoting from the fallback table. -- This is a

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-04 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1382384038 ## lucene/core/src/java/org/apache/lucene/util/ByteBlockPool.java: ## @@ -38,6 +38,8 @@ public final class ByteBlockPool implements Accountable { /** Abstract cl

Re: [PR] TestIndexWriterOnVMError.testUnknownError times out (fixes potential IW deadlock on tragic exceptions). [lucene]

2023-11-04 Thread via GitHub
s1monw commented on code in PR #12751: URL: https://github.com/apache/lucene/pull/12751#discussion_r1382391250 ## lucene/core/src/java/org/apache/lucene/index/IndexWriter.java: ## @@ -2560,10 +2560,15 @@ private void rollbackInternalNoCommit() throws IOException {

Re: [I] Add Scalar Quantization codec for Vectors [lucene]

2023-11-04 Thread via GitHub
benwtrent commented on issue #12497: URL: https://github.com/apache/lucene/issues/12497#issuecomment-1793446854 I have done a poor job of linking against the related work for bringing vector lossy-compression. The initial implementation of adding int8 (really, its int7 because of sig

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-11-04 Thread via GitHub
rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1793456342 Benchmarks for the intel cpus. There is one place i'd fix, if we could detect sapphire rapids and avoid scalar FMA. But i have no way to detect it based on what new features it has / what

[PR] Remove usage of deprecated java.util.Locale constructor [lucene]

2023-11-04 Thread via GitHub
ChrisHegarty opened a new pull request, #12761: URL: https://github.com/apache/lucene/pull/12761 This commit removes usages of the deprecated `java.util.Locale` constructor with `Locale.Builder`. The motivation for this change is to address tech debt identified when experimenting wit

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-04 Thread via GitHub
mikemccand merged PR #12738: URL: https://github.com/apache/lucene/pull/12738 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [I] FSTCompiler's NodeHash should fully duplicate `byte[]` slices from the growing FST [lucene]

2023-11-04 Thread via GitHub
mikemccand closed issue #12714: FSTCompiler's NodeHash should fully duplicate `byte[]` slices from the growing FST URL: https://github.com/apache/lucene/issues/12714 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-04 Thread via GitHub
mikemccand commented on PR #12738: URL: https://github.com/apache/lucene/pull/12738#issuecomment-1793476420 I merged to main, thank you @dungba88 for the fast iterations! I could barely keep up just reviewing :) After all this FST dust settles let's remember to add your CHANGES.txt e

Re: [PR] Specialize arc store for continuous label in FST [lucene]

2023-11-04 Thread via GitHub
mikemccand commented on PR #12748: URL: https://github.com/apache/lucene/pull/12748#issuecomment-1793477236 Hello @easyice, I'm sorry but I just merged #12738 which caused conflicts here ... could you please rebase and resolve conflicts? I think this change is ready except for that. Thank

Re: [PR] Specialize arc store for continuous label in FST [lucene]

2023-11-04 Thread via GitHub
mikemccand commented on code in PR #12748: URL: https://github.com/apache/lucene/pull/12748#discussion_r1382419312 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -96,6 +96,13 @@ public enum INPUT_TYPE { */ static final byte ARCS_FOR_DIRECT_ADDRESSING =

Re: [PR] Streamline FST constructors and make it fully read-only [lucene]

2023-11-04 Thread via GitHub
mikemccand commented on PR #12758: URL: https://github.com/apache/lucene/pull/12758#issuecomment-1793483247 > Note: We also might want to remove the constructor with FSTStore completely, and users need to call `init()` themselves? +1 -- This is an automated message from the Apache

Re: [PR] Streamline FST constructors and make it fully read-only [lucene]

2023-11-04 Thread via GitHub
mikemccand commented on code in PR #12758: URL: https://github.com/apache/lucene/pull/12758#discussion_r1382420527 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -411,17 +401,42 @@ public FST(DataInput metaIn, DataInput in, Outputs outputs) throws IOExceptio

Re: [PR] LUCENE-10195: Improve Gradle build speed [lucene]

2023-11-04 Thread via GitHub
mikemccand commented on PR #414: URL: https://github.com/apache/lucene/pull/414#issuecomment-1793484726 > @mikemccand - If you are interested, [ge.apache.org](https://ge.apache.org/scans?search.rootProjectNames=lucene-root&search.timeZoneId=America%2FChicago) is available to the Lucene proj

Re: [PR] LUCENE-10195: Improve Gradle build speed [lucene]

2023-11-04 Thread via GitHub
mikemccand commented on PR #414: URL: https://github.com/apache/lucene/pull/414#issuecomment-1793485933 > > I don't like how slow our gradle builds are, so if we can make it faster, that'd be awesome. > > Are they? What in particular is slow for you, Mike? There's tons of stuff that

Re: [PR] Make FST fully read-only and streamline FST constructor [lucene]

2023-11-04 Thread via GitHub
mikemccand commented on PR #12691: URL: https://github.com/apache/lucene/pull/12691#issuecomment-1793486552 We closed this PR in favor of #12758? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [I] Should we not enlarge PagedGrowableWriter initial bitPerValue on NodeHash.rehash()? [lucene]

2023-11-04 Thread via GitHub
mikemccand commented on issue #12744: URL: https://github.com/apache/lucene/issues/12744#issuecomment-1793486887 > I think this should be enhancement instead of bug, but I can't edit it. @mikemccand can you help to change the label? Done. Annoying that GH won't let you do that, espec

Re: [I] Should we not enlarge PagedGrowableWriter initial bitPerValue on NodeHash.rehash()? [lucene]

2023-11-04 Thread via GitHub
mikemccand commented on issue #12744: URL: https://github.com/apache/lucene/issues/12744#issuecomment-1793487558 > Does that mean every values, including the ones with low-address will use the same bpv as the high-address nodes? PagedGrowableWriter already enlarge the bpv [automatically](h

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-11-04 Thread via GitHub
rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1793488056 Here are the ARMs. I had to tweak ARM to use FMA more aggressively to fully utilize the gravitons. The problem there is just apple silicon, it is good we did not move forwards with benchma

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-11-04 Thread via GitHub
uschindler commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1793489975 > You may not like my detector, but I think it is quite practical and prevents slow execution. The detector is funny, but it won't detect slow apple silicon if you run Linux on

Re: [PR] Random access term dictionary [lucene]

2023-11-04 Thread via GitHub
nknize commented on code in PR #12688: URL: https://github.com/apache/lucene/pull/12688#discussion_r1382429884 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/lucene90/randomaccess/bitpacking/BitPacker.java: ## Review Comment: > Since they are of the same size..

Re: [PR] LUCENE-10195: Improve Gradle build speed [lucene]

2023-11-04 Thread via GitHub
Dawid commented on PR #414: URL: https://github.com/apache/lucene/pull/414#issuecomment-1793499490 TLDR; No problemo, Mickey. You can always count on me, just like last Friday when we all get wasted and I had to Uber you home. Take care! -- This is an automated message from the Apache

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-11-04 Thread via GitHub
rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1793512513 > It is good that we have the sysprops to enforce FMA or disable it, overriding default detection if needed. So on apple chips with Linux you can disable it. 👻 exactly. we can't det

Re: [PR] Fix a bug in ShapeTestUtil [lucene]

2023-11-04 Thread via GitHub
stefanvodita commented on PR #12287: URL: https://github.com/apache/lucene/pull/12287#issuecomment-1793516615 The test could call the modified methods with a random `box` and assert that all vertices of the given polygon are different. We can reuse `hasIdenticalVertices` from #12757. --

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-11-04 Thread via GitHub
rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1793549914 for transparency, this was my testing procedure. I did lots of other things such as poking around and running experiments too, but for the basics of "running benchmark across different ins

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-11-04 Thread via GitHub
rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1793551757 and yeah, the `avx-turbo` is measuring double precision when it "benches" FMA and we do float precision, i know. but its code already written and a nice non-java way to get the wanted info

Re: [PR] LUCENE-10195: Improve Gradle build speed [lucene]

2023-11-04 Thread via GitHub
dweiss commented on PR #414: URL: https://github.com/apache/lucene/pull/414#issuecomment-1793560062 This is the problem with github at mentions, @mikemccand - whoever this is that had to drive you home, it wasn't me... -- This is an automated message from the Apache Git Service. To respo

Re: [PR] LUCENE-10195: Improve Gradle build speed [lucene]

2023-11-04 Thread via GitHub
dweiss commented on PR #414: URL: https://github.com/apache/lucene/pull/414#issuecomment-1793561529 Anyway - it's going to be difficult to saturate your CPU with tests alone, especially on a beefy machine. We limit the number of forked test JVMs - this you could tweak - but you'll soon hit

Re: [PR] LUCENE-10195: Improve Gradle build speed [lucene]

2023-11-04 Thread via GitHub
Dawid commented on PR #414: URL: https://github.com/apache/lucene/pull/414#issuecomment-1793567156 > This is the problem with github at mentions, @mikemccand - whoever this is that had to drive you home, it wasn't me... Dawid, please don't treat it as problem, but as a miracle/opportu

Re: [PR] Streamline FST constructors and make it fully read-only [lucene]

2023-11-04 Thread via GitHub
dungba88 commented on code in PR #12758: URL: https://github.com/apache/lucene/pull/12758#discussion_r1382472928 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -1132,4 +1137,28 @@ public abstract static class BytesReader extends DataInput { /** Returns

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-11-04 Thread via GitHub
uschindler closed pull request #12737: Speed up vectorutil float scalar methods, unroll properly, use fma where possible URL: https://github.com/apache/lucene/pull/12737 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-11-04 Thread via GitHub
uschindler commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1793568091 Sorry, pressed wrong button. Reopened. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Make FST fully read-only and streamline FST constructor [lucene]

2023-11-04 Thread via GitHub
dungba88 commented on PR #12691: URL: https://github.com/apache/lucene/pull/12691#issuecomment-1793570275 Yeah, this PR was originally opened for another purpose: consolidate the FSTStore and BytesStore, and that was already done. -- This is an automated message from the Apache Git Servic

Re: [PR] Streamline FST constructors and make it fully read-only [lucene]

2023-11-04 Thread via GitHub
dungba88 commented on code in PR #12758: URL: https://github.com/apache/lucene/pull/12758#discussion_r1382474566 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -828,6 +829,26 @@ public void add(IntsRef input, T output) throws IOException { lastI

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-11-04 Thread via GitHub
rmuir merged PR #12737: URL: https://github.com/apache/lucene/pull/12737 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-04 Thread via GitHub
dungba88 commented on PR #12738: URL: https://github.com/apache/lucene/pull/12738#issuecomment-1793580112 Thank you @mikemccand ! Agree we should have a single changes entry summarizing all different PR -- This is an automated message from the Apache Git Service. To respond to the message

Re: [PR] Specialize arc store for continuous label in FST [lucene]

2023-11-04 Thread via GitHub
easyice commented on code in PR #12748: URL: https://github.com/apache/lucene/pull/12748#discussion_r1382516071 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -96,6 +96,13 @@ public enum INPUT_TYPE { */ static final byte ARCS_FOR_DIRECT_ADDRESSING = 1