Re: [PR] StringsToAutomaton#build to take List as parameter instead of Collection [lucene]

2023-11-01 Thread via GitHub
shubhamvishu commented on PR #12427: URL: https://github.com/apache/lucene/pull/12427#issuecomment-1790129043 Nice!I like it how we discovered a build issue bug(which was hidden for sometime now I think) and fixed it. Thanks for solving @dweiss! -- This is an automated message from th

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-11-01 Thread via GitHub
rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1790097686 ARM Graviton3E (256-bit SVE) Master (well #12747 for sanity): ``` Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.

Re: [PR] stabilize vectorutil benchmark [lucene]

2023-11-01 Thread via GitHub
rmuir commented on PR #12747: URL: https://github.com/apache/lucene/pull/12747#issuecomment-1790041666 and i know it is annoying it runs slower, i really tried to not be overkill, but some methods here are super simple and super stable and some are unstable. So I gave all methods minimum 3

[PR] stabilize vectorutil benchmark [lucene]

2023-11-01 Thread via GitHub
rmuir opened a new pull request, #12747: URL: https://github.com/apache/lucene/pull/12747 This benchmark is too noisy across forks which makes comparisons impossible and misleading. Especially vectorized float methods on some machines: it is almost useless. I spent some time to reduc

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-11-01 Thread via GitHub
rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1789905123 OK. sounds good to me @uschindler . I agree the user should be in control. And I'd like to avoid us adding tons of flags for this stuff unless we need it for testing. that's just my opinio

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-01 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1378882326 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -186,132 +214,99 @@ private long hash(FSTCompiler.UnCompiledNode node) { return h; }

Re: [PR] Clean up UnCompiledNode.inputCount [lucene]

2023-11-01 Thread via GitHub
dungba88 closed pull request #12735: Clean up UnCompiledNode.inputCount URL: https://github.com/apache/lucene/pull/12735 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [PR] Speed up disjunctions by computing estimations of the score of the k-th top hit up-front. [lucene]

2023-11-01 Thread via GitHub
jpountz commented on PR #12526: URL: https://github.com/apache/lucene/pull/12526#issuecomment-1789662266 @mikemccand FYI I gave a try at adding some interesting boolean queries to nightly benchmarks at https://github.com/mikemccand/luceneutil/pull/240. -- This is an automated message fro

Re: [I] TestDeletionPolicy.testOpenPriorSnapshot failing [lucene]

2023-11-01 Thread via GitHub
jpountz closed issue #12746: TestDeletionPolicy.testOpenPriorSnapshot failing URL: https://github.com/apache/lucene/issues/12746 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] TestDeletionPolicy.testOpenPriorSnapshot failing [lucene]

2023-11-01 Thread via GitHub
jpountz commented on issue #12746: URL: https://github.com/apache/lucene/issues/12746#issuecomment-1789651692 Thanks for reporting, I pushed a fix at https://github.com/apache/lucene/commit/66324f763fc7fb0d8e7cd6f334e5438f0171c84e. -- This is an automated message from the Apache Git Servi

Re: [I] TestDeletionPolicy.testOpenPriorSnapshot failing [lucene]

2023-11-01 Thread via GitHub
benwtrent commented on issue #12746: URL: https://github.com/apache/lucene/issues/12746#issuecomment-1789639897 I verified this doesn't happen in 9.8. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[I] TestDeletionPolicy.testOpenPriorSnapshot failing [lucene]

2023-11-01 Thread via GitHub
benwtrent opened a new issue, #12746: URL: https://github.com/apache/lucene/issues/12746 ### Description TestDeletionPolicy.testOpenPriorSnapshot fails. Assertion fails on assuming we are on a previous commit with more than one leaf. Fails on main and 9.x. stack

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-11-01 Thread via GitHub
jpountz commented on code in PR #12729: URL: https://github.com/apache/lucene/pull/12729#discussion_r1379282330 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java: ## @@ -399,41 +281,30 @@ private HnswGraph getGraph(FieldEntry entry) throws

Re: [I] JavaCompile tasks may be in up-to-date state when modular dependencies have changed leading to odd runtime errors [lucene]

2023-11-01 Thread via GitHub
dweiss commented on issue #12742: URL: https://github.com/apache/lucene/issues/12742#issuecomment-1789587275 I've provided a PR that fixes this. It is a corner case of us providing custom javac parameters (modular classpath). I am surprised this took so long to be discovered and I apologize

[PR] Fix javac task inputs so that they include modular dependencies #12742 [lucene]

2023-11-01 Thread via GitHub
dweiss opened a new pull request, #12745: URL: https://github.com/apache/lucene/pull/12745 Fixes #12742., -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-

Re: [PR] StringsToAutomaton#build to take List as parameter instead of Collection [lucene]

2023-11-01 Thread via GitHub
dweiss commented on PR #12427: URL: https://github.com/apache/lucene/pull/12427#issuecomment-1789519097 See #12742 - it's a problem with out javac task settings. Running a clean is a workaround. I will provide a fix for this when I get a chance, it's not a critical issue (but it is a bug in

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-11-01 Thread via GitHub
uschindler commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1789483634 Hi @rmuir, I am fine with both approaches. Let me just create a new PR with some changes for the Constants.java class and a separate pkg private utility class for looking up JVM ar

Re: [I] byte to int in TruncateTokenFilterFactory to TruncateTokenFilter [lucene]

2023-11-01 Thread via GitHub
asubbu90 commented on issue #12449: URL: https://github.com/apache/lucene/issues/12449#issuecomment-1789370514 Hi @robro612 , you can see I have already opened a PR #12507 on this issue. Do you want to have more context on this? -- This is an automated message from the Apache Git Service.

Re: [PR] StringsToAutomaton#build to take List as parameter instead of Collection [lucene]

2023-11-01 Thread via GitHub
gsmiller commented on PR #12427: URL: https://github.com/apache/lucene/pull/12427#issuecomment-1789348091 @benwtrent glad it was a simple fix. Sorry it created churn! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] StringsToAutomaton#build to take List as parameter instead of Collection [lucene]

2023-11-01 Thread via GitHub
benwtrent commented on PR #12427: URL: https://github.com/apache/lucene/pull/12427#issuecomment-1789266991 @gsmiller you are 100% correct 🤦 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] StringsToAutomaton#build to take List as parameter instead of Collection [lucene]

2023-11-01 Thread via GitHub
gsmiller commented on PR #12427: URL: https://github.com/apache/lucene/pull/12427#issuecomment-1789264349 @benwtrent is this an issue of the code only being partially rebuilt? The signature for `Automata#makeStringUnion` was changed to accept a more general `Iterable` as opposed to a `Colle

Re: [PR] speedup arm int functions? [lucene]

2023-11-01 Thread via GitHub
rmuir commented on PR #12743: URL: https://github.com/apache/lucene/pull/12743#issuecomment-1789229323 thanks @ChrisHegarty ! I will try to dig in more to this. Again i don't see speedups on my x86 so something has changed (maybe architecture) thats allowing more to happen in parallel on yo

Re: [PR] StringsToAutomaton#build to take List as parameter instead of Collection [lucene]

2023-11-01 Thread via GitHub
benwtrent commented on PR #12427: URL: https://github.com/apache/lucene/pull/12427#issuecomment-1789218896 Something weird is happening. This commit is causing the following failure: ``` ./gradlew test --tests TestUnifiedHighlighterTermIntervals.testCustomFieldValueSource -Dtests.seed

Re: [I] Use max BPV encoding in postings if doc buffer size less than ForUtil.BLOCK_SIZE [lucene]

2023-11-01 Thread via GitHub
easyice commented on issue #12717: URL: https://github.com/apache/lucene/issues/12717#issuecomment-1789106681 Oh.. Group-varint is a interesting encoder, I'd love to try it later in the week -- This is an automated message from the Apache Git Service. To respond to the message, please

[I] Should we not enlarge PagedGrowableWriter initial bitPerValue on NodeHash.rehash() [lucene]

2023-11-01 Thread via GitHub
dungba88 opened a new issue, #12744: URL: https://github.com/apache/lucene/issues/12744 ### Description Spawn from https://github.com/apache/lucene/pull/12738 It seems on [rehash](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/fst/NodeHas

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-01 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1378882326 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -186,132 +214,99 @@ private long hash(FSTCompiler.UnCompiledNode node) { return h; }

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-01 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1378841828 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -328,7 +323,128 @@ private void rehash(long lastNodeAddress) throws IOException { }

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-01 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1378848941 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -107,28 +121,43 @@ public long add(FSTCompiler.UnCompiledNode nodeIn) throws IOException {

Re: [PR] speedup arm int functions? [lucene]

2023-11-01 Thread via GitHub
ChrisHegarty commented on PR #12743: URL: https://github.com/apache/lucene/pull/12743#issuecomment-1789014829 jmh prof output from my Linux x64 rocket lake - https://gist.github.com/ChrisHegarty/508bb1857cb50df0d757f711c81fd740 -- This is an automated message from the Apache Git Service.

Re: [PR] speedup arm int functions? [lucene]

2023-11-01 Thread via GitHub
ChrisHegarty commented on PR #12743: URL: https://github.com/apache/lucene/pull/12743#issuecomment-1789007292 > Your x64 is still i5-11400 ? I will look into this more. Yes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-01 Thread via GitHub
mikemccand commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1378809307 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -107,28 +121,43 @@ public long add(FSTCompiler.UnCompiledNode nodeIn) throws IOException {

Re: [PR] speedup arm int functions? [lucene]

2023-11-01 Thread via GitHub
rmuir commented on PR #12743: URL: https://github.com/apache/lucene/pull/12743#issuecomment-1788919960 and what is the compiler's excuse this time? :) no floating point here. I should just be able to write a simple loop and get good performance! -- This is an automated message from the Ap

Re: [PR] speedup arm int functions? [lucene]

2023-11-01 Thread via GitHub
rmuir commented on PR #12743: URL: https://github.com/apache/lucene/pull/12743#issuecomment-1788910222 thank you for the explanation, so now it just adds another mystery for me. Your x64 is still i5-11400 ? I will look into this more. Yes on the ARM, the assembly didn't look like wha

Re: [PR] speedup arm int functions? [lucene]

2023-11-01 Thread via GitHub
ChrisHegarty commented on PR #12743: URL: https://github.com/apache/lucene/pull/12743#issuecomment-179191 I see reasonable speedups on both x64 and ARM, but sadly see no vectorization in the disassembly. The speed seems to come from the 2x instructions/pipelining (?) per strip mined lo

Re: [I] Would SIMD powered sort (on top of Panama) be worth it? [lucene]

2023-11-01 Thread via GitHub
mikemccand commented on issue #12399: URL: https://github.com/apache/lucene/issues/12399#issuecomment-1788845861 Some exciting updates here ... Intel continues to improve this "super fast sorting using SIMD instructions" library: https://www.phoronix.com/news/Intel-x86-simd-sort-4.0

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-01 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1378704913 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -186,119 +197,85 @@ private long hash(FSTCompiler.UnCompiledNode node) { return h; }

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-01 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1378703915 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -186,119 +197,85 @@ private long hash(FSTCompiler.UnCompiledNode node) { return h; }

Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

2023-11-01 Thread via GitHub
benwtrent commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-1788822945 So, I replicated the jvector benchmark (the lucene part) using the new int8 quantization. Note, this is with `0` fan out or extra top-k gathered. Since the benchmark on JV

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-01 Thread via GitHub
gf2121 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1378670256 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -186,119 +193,85 @@ private long hash(FSTCompiler.UnCompiledNode node) { return h; } -

Re: [PR] Remove patching for doc blocks. [lucene]

2023-11-01 Thread via GitHub
slow-J commented on PR #12741: URL: https://github.com/apache/lucene/pull/12741#issuecomment-1788786005 > Thanks @slow-J ! I left some minor comments about additional `90` -> `99` refactoring. Thanks @gf2121 , committed all the suggestions. -- This is an automated message from the

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-01 Thread via GitHub
mikemccand commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1378650643 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -186,119 +197,85 @@ private long hash(FSTCompiler.UnCompiledNode node) { return h; }

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-01 Thread via GitHub
dungba88 commented on PR #12738: URL: https://github.com/apache/lucene/pull/12738#issuecomment-1788758864 > Eventually it's moved to fallback, and, maybe it never gets promoted back (single copy), or, maybe it does (+1 copy) There is actually already one copy before this, which is whe

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-01 Thread via GitHub
mikemccand commented on PR #12738: URL: https://github.com/apache/lucene/pull/12738#issuecomment-1788744551 > But I ended up using a `List` where each item is a node instead of ByteBlockPool due to the following reasons: Hmm -- this is sizable added RAM overhead per entry. Added arra

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-01 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1378601593 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -186,119 +193,85 @@ private long hash(FSTCompiler.UnCompiledNode node) { return h; }

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-01 Thread via GitHub
mikemccand commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1378599798 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -214,7 +222,13 @@ private long hash(long node) throws IOException { * Compares an unfroz

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-01 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1378577965 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -110,25 +110,34 @@ public long add(FSTCompiler.UnCompiledNode nodeIn) throws IOException {

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-01 Thread via GitHub
gf2121 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1378573421 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -186,119 +193,85 @@ private long hash(FSTCompiler.UnCompiledNode node) { return h; } -

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-01 Thread via GitHub
gf2121 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1378571480 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -110,25 +110,34 @@ public long add(FSTCompiler.UnCompiledNode nodeIn) throws IOException {

Re: [PR] Remove patching for doc blocks. [lucene]

2023-11-01 Thread via GitHub
gf2121 commented on code in PR #12741: URL: https://github.com/apache/lucene/pull/12741#discussion_r1378509674 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99PostingsFormat.java: ## @@ -0,0 +1,518 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und