Re: [PR] Ensure LeafCollector#finish is only called once on the main collector during drill-sideways [lucene]

2023-10-10 Thread via GitHub
gf2121 commented on code in PR #12642: URL: https://github.com/apache/lucene/pull/12642#discussion_r1354116686 ## lucene/facet/src/test/org/apache/lucene/facet/TestDrillSideways.java: ## @@ -316,6 +317,68 @@ public void testBasic() throws Exception { IOUtils.close(searcher.

Re: [PR] Ensure LeafCollector#finish is only called once on the main collector during drill-sideways [lucene]

2023-10-10 Thread via GitHub
gf2121 commented on code in PR #12642: URL: https://github.com/apache/lucene/pull/12642#discussion_r1354116012 ## lucene/facet/src/test/org/apache/lucene/facet/TestDrillSideways.java: ## @@ -316,6 +316,58 @@ public void testBasic() throws Exception { IOUtils.close(searcher.

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-10-10 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1353826324 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -21,19 +21,18 @@ import java.util.List; import org.apache.lucene.store.DataInput; import

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-10-10 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1353826324 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -21,19 +21,18 @@ import java.util.List; import org.apache.lucene.store.DataInput; import

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-10-10 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1353826324 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -21,19 +21,18 @@ import java.util.List; import org.apache.lucene.store.DataInput; import

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-10-10 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1353826324 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -21,19 +21,18 @@ import java.util.List; import org.apache.lucene.store.DataInput; import

Re: [I] Multiple ClassNotFoundExceptions in IntelliJ Fat Jar on ARM64 Java 20 [lucene]

2023-10-10 Thread via GitHub
andrewlalis commented on issue #12307: URL: https://github.com/apache/lucene/issues/12307#issuecomment-1756220873 @uschindler If fat JARs are not supported or recommended with Lucene, what *is* the recommended way to deploy a project incorporating Lucene? I cannot find any resources on this

Re: [PR] Gradle 8.4 [lucene]

2023-10-10 Thread via GitHub
risdenk commented on code in PR #12650: URL: https://github.com/apache/lucene/pull/12650#discussion_r1353286034 ## lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene84/gen_ForUtil.py: ## @@ -15,7 +15,7 @@ # See the License for the specific language governi

[PR] Optimize OnHeapHnswGraph [lucene]

2023-10-10 Thread via GitHub
zhaih opened a new pull request, #12651: URL: https://github.com/apache/lucene/pull/12651 ### Description -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

[PR] Gradle 8.4 [lucene]

2023-10-10 Thread via GitHub
risdenk opened a new pull request, #12650: URL: https://github.com/apache/lucene/pull/12650 ### Description Upgrades Gradle from 7.6 to 8.4 - supports building directly with JDK 21 LTS. * https://docs.gradle.org/8.4/release-notes.html Upgrades a few build plugins to support J

[I] TestIndexWriterThreadsToSegments.testSegmentCountOnFlushRandom fails randomly [lucene]

2023-10-10 Thread via GitHub
benwtrent opened a new issue, #12649: URL: https://github.com/apache/lucene/issues/12649 ### Description This has failed multiple times in continuous testing environment. I have a couple different reproduction lines to try, but none seem to replicate the failure locally. The CI on wh

[I] TestSizeBoundedForceMerge.testByteSizeLimit test failure [lucene]

2023-10-10 Thread via GitHub
benwtrent opened a new issue, #12648: URL: https://github.com/apache/lucene/issues/12648 ### Description Came up in continuous testing. It has failed on 9x and main. It replicates reliably. ``` org.apache.lucene.index.TestSizeBoundedForceMerge > testByteSizeLimit FAILE

Re: [PR] Refactor Lucene95 to allow off heap vector reader reuse [lucene]

2023-10-10 Thread via GitHub
benwtrent merged PR #12629: URL: https://github.com/apache/lucene/pull/12629 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-10 Thread via GitHub
tveasey commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1353096295 ## lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java: ## @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more +

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-10 Thread via GitHub
benwtrent commented on PR #12582: URL: https://github.com/apache/lucene/pull/12582#issuecomment-1755980866 To address some of @jpountz's worries around adversarial cases, I tested one. Cohere-Wiki, I created 100 clusters via KMeans and indexed the documents sorted by their respective

Re: [I] ability to run JMH benchmarks from gradle [lucene]

2023-10-10 Thread via GitHub
dweiss commented on issue #12641: URL: https://github.com/apache/lucene/issues/12641#issuecomment-1755962326 Thanks @risdenk ! I'll be taking a look at this on Thursday. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Improve refresh speed with softdelete enable [lucene]

2023-10-10 Thread via GitHub
jpountz commented on PR #12557: URL: https://github.com/apache/lucene/pull/12557#issuecomment-1755858324 I understand the idea, I'm a bit less happy about special-casing the soft deletes field in our doc values file format. I don't have a better suggestion though... -- This is an automat

Re: [PR] Use radix sort to speed up the sorting of terms in TermInSetQuery [lucene]

2023-10-10 Thread via GitHub
gsmiller commented on code in PR #12587: URL: https://github.com/apache/lucene/pull/12587#discussion_r1352940936 ## lucene/core/src/java/org/apache/lucene/util/StringSorter.java: ## @@ -19,7 +19,11 @@ import java.util.Comparator; -abstract class StringSorter extends Sorter

Re: [PR] Use radix sort to speed up the sorting of terms in TermInSetQuery [lucene]

2023-10-10 Thread via GitHub
gsmiller commented on code in PR #12587: URL: https://github.com/apache/lucene/pull/12587#discussion_r1352934399 ## lucene/core/src/java/org/apache/lucene/search/TermInSetQuery.java: ## @@ -112,7 +113,23 @@ private static PrefixCodedTerms packTerms(String field, Collection ter

Re: [PR] Ensure LeafCollector#finish is only called once on the main collector during drill-sideways [lucene]

2023-10-10 Thread via GitHub
gsmiller commented on code in PR #12642: URL: https://github.com/apache/lucene/pull/12642#discussion_r1352875779 ## lucene/facet/src/test/org/apache/lucene/facet/TestDrillSideways.java: ## @@ -316,6 +316,58 @@ public void testBasic() throws Exception { IOUtils.close(searche

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-10 Thread via GitHub
benwtrent commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1352871027 ## lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java: ## @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] Speedup integer functions for 128-bit neon vectors [lucene]

2023-10-10 Thread via GitHub
gf2121 commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1755588137 Thanks @rmuir ! I will try. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [I] Sum up bit count with vector API [lucene]

2023-10-10 Thread via GitHub
gf2121 commented on issue #12639: URL: https://github.com/apache/lucene/issues/12639#issuecomment-1755527114 on AVX-512: ``` Benchmark (size) Mode Cnt Score Error Units BitcountBenchmark.bitCountNew 32 thrpt5 62.222 ± 0.034 ops/us

Re: [I] ability to run JMH benchmarks from gradle [lucene]

2023-10-10 Thread via GitHub
risdenk commented on issue #12641: URL: https://github.com/apache/lucene/issues/12641#issuecomment-1755507750 Not sure this is going to help or hurt - but Solr has a benchmark module that does jmh stuff - https://github.com/apache/solr/tree/main/solr/benchmark -- This is an automated mess

[PR] Cleanup flushing logic in DocumentsWriter [lucene]

2023-10-10 Thread via GitHub
s1monw opened a new pull request, #12647: URL: https://github.com/apache/lucene/pull/12647 DocumentsWriter had some duplicate logic for iterating over segments to be flushed. This change simplifies some of the loops and moves common code in on place. This also adds tests to ensure we actual

Re: [I] ability to run JMH benchmarks from gradle [lucene]

2023-10-10 Thread via GitHub
rmuir commented on issue #12641: URL: https://github.com/apache/lucene/issues/12641#issuecomment-1755486678 Thanks @dweiss, I honestly tried my hand at using the plugin, there's just enough going on here that I wasn't able to make progress: * alt-jvm usage (i'm particularly interested in

Re: [I] Sum up bit count with vector API [lucene]

2023-10-10 Thread via GitHub
rmuir commented on issue #12639: URL: https://github.com/apache/lucene/issues/12639#issuecomment-1755455998 @jpountz on aarch64 (128-bit simd) the difference is small: ``` Benchmark (size) Mode CntScore Error Units BitCountBenchmark.bitCountNew 32

Re: [I] IndexWriter should clean up unreferenced files when segment merge fails due to disk full [lucene]

2023-10-10 Thread via GitHub
RS146BIJAY closed issue #12228: IndexWriter should clean up unreferenced files when segment merge fails due to disk full URL: https://github.com/apache/lucene/issues/12228 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [I] Sum up bit count with vector API [lucene]

2023-10-10 Thread via GitHub
jpountz commented on issue #12639: URL: https://github.com/apache/lucene/issues/12639#issuecomment-1755404204 I'm also curious if we get different numbers on size=32 (2,048 bits). This is the most interesting number to me since it is the window size of `BooleanScorer`. Likewise `IndexedDISI

Re: [PR] Improve refresh speed with softdelete enable [lucene]

2023-10-10 Thread via GitHub
easyice commented on PR #12557: URL: https://github.com/apache/lucene/pull/12557#issuecomment-1755401582 Update: revert changes about `IndexedDISI#advance` to keep things as simple as possible. -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [I] Sum up bit count with vector API [lucene]

2023-10-10 Thread via GitHub
rmuir commented on issue #12639: URL: https://github.com/apache/lucene/issues/12639#issuecomment-1755378013 The compiler messes it up on arm, too: ``` Benchmark (size) Mode Cnt Score Error Units BitCountBenchmark.bitCountNew1024 thrpt5 3.440 ± 0

Re: [I] Sum up bit count with vector API [lucene]

2023-10-10 Thread via GitHub
rmuir commented on issue #12639: URL: https://github.com/apache/lucene/issues/12639#issuecomment-1755366199 I see less of an improvement: ``` Benchmark (size) Mode Cnt Score Error Units BitCountBenchmark.bitCountNew 1024 thrpt5 2.24

Re: [PR] Move addNode to FSTCompiler [lucene]

2023-10-10 Thread via GitHub
romseygeek commented on PR #12646: URL: https://github.com/apache/lucene/pull/12646#issuecomment-1755312341 Thanks @dungba88! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Move addNode to FSTCompiler [lucene]

2023-10-10 Thread via GitHub
romseygeek merged PR #12646: URL: https://github.com/apache/lucene/pull/12646 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Speedup integer functions for 128-bit neon vectors [lucene]

2023-10-10 Thread via GitHub
rmuir commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1755272993 @gf2121 maybe, if you have time, you could run benchmark with `-prof perfasm` and upload the output here? It could solve the mystery. I am curious if it is just a cpu difference,

Re: [PR] Ensure LeafCollector#finish is only called once on the main collector during drill-sideways [lucene]

2023-10-10 Thread via GitHub
gf2121 commented on code in PR #12642: URL: https://github.com/apache/lucene/pull/12642#discussion_r1352364817 ## lucene/facet/src/test/org/apache/lucene/facet/TestDrillSideways.java: ## @@ -316,6 +316,58 @@ public void testBasic() throws Exception { IOUtils.close(searcher.

Re: [PR] Ensure LeafCollector#finish is only called once on the main collector during drill-sideways [lucene]

2023-10-10 Thread via GitHub
jpountz commented on code in PR #12642: URL: https://github.com/apache/lucene/pull/12642#discussion_r1352362240 ## lucene/facet/src/test/org/apache/lucene/facet/TestDrillSideways.java: ## @@ -316,6 +316,58 @@ public void testBasic() throws Exception { IOUtils.close(searcher

Re: [PR] Ensure LeafCollector#finish is only called once on the main collector during drill-sideways [lucene]

2023-10-10 Thread via GitHub
gsmiller commented on code in PR #12642: URL: https://github.com/apache/lucene/pull/12642#discussion_r1352301082 ## lucene/facet/src/test/org/apache/lucene/facet/TestDrillSideways.java: ## @@ -316,6 +316,58 @@ public void testBasic() throws Exception { IOUtils.close(searche

Re: [PR] Early terminate visit BKD leaf when current value greater than upper point in sorted dim. [lucene]

2023-10-10 Thread via GitHub
iverase commented on PR #12528: URL: https://github.com/apache/lucene/pull/12528#issuecomment-1754953206 > Do you mean the off the shelf benchmark in lucene self? One case I am interested in is the geo benchmarks. It is not clear to me if some of those queries (e.g polygon query) can

Re: [PR] Move addNode to FSTCompiler [lucene]

2023-10-10 Thread via GitHub
dungba88 commented on PR #12646: URL: https://github.com/apache/lucene/pull/12646#issuecomment-1754877692 Thanks @romseygeek I have added the entry (under API change section). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] Move addNode to FSTCompiler [lucene]

2023-10-10 Thread via GitHub
romseygeek commented on PR #12646: URL: https://github.com/apache/lucene/pull/12646#issuecomment-1754833603 I'm happy to merge and backport @dungba88. Can you also add an entry to CHANGES.txt in the 9.9 section? -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] Early terminate visit BKD leaf when current value greater than upper point in sorted dim. [lucene]

2023-10-10 Thread via GitHub
vsop-479 commented on PR #12528: URL: https://github.com/apache/lucene/pull/12528#issuecomment-1754812860 @iverase Here is a performance test from luceneutil. box-points| baseline | candidate | Diff -- | -- | -- | -- BEST M hits/sec | 101.09 | 103.64 | 2.5% BEST QPS | 102.87 | 10

Re: [PR] Reduce FST block size for BlockTreeTermsWriter [lucene]

2023-10-10 Thread via GitHub
gf2121 commented on PR #12604: URL: https://github.com/apache/lucene/pull/12604#issuecomment-1754790165 @jpountz Thanks for annotating ! I also checked `blunders.io` for more details: * GC pause time: 6.38% -> 5.91% * Allocation Rate: 3.7 GiB/s -> 2.6 GiB/s * much more less `

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-10-10 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1351949356 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -21,19 +21,18 @@ import java.util.List; import org.apache.lucene.store.DataInput; impor

Re: [PR] Reduce FST block size for BlockTreeTermsWriter [lucene]

2023-10-10 Thread via GitHub
jpountz commented on PR #12604: URL: https://github.com/apache/lucene/pull/12604#issuecomment-1754762202 It looks like there's a bit less [Young GC](http://people.apache.org/~mikemccand/lucenebench/indexing.html) in nightly benchmarks since this change was merged, from 6-8 seconds, to consi

Re: [PR] Move addNode to FSTCompiler [lucene]

2023-10-10 Thread via GitHub
dungba88 commented on PR #12646: URL: https://github.com/apache/lucene/pull/12646#issuecomment-1754754284 > +1 to backport to 9x -- this is nice refactoring that does not change any API and is low risk. I can do this. Wondering if creating a PR to lucene-9_x branch would suffice? --

Re: [PR] [WIP] first cut at bounding the NodeHash size during FST compilation [lucene]

2023-10-10 Thread via GitHub
gf2121 commented on PR #12633: URL: https://github.com/apache/lucene/pull/12633#issuecomment-1754735435 > With the PR, you unfortunately cannot easily say "give me a minimal FST at all costs", like you can with main today. You'd have to keep trying larger and larger NodeHash sizes until the

Re: [PR] Move addNode to FSTCompiler [lucene]

2023-10-10 Thread via GitHub
dungba88 commented on code in PR #12646: URL: https://github.com/apache/lucene/pull/12646#discussion_r1351877680 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -640,381 +602,11 @@ public static boolean targetHasArcs(Arc arc) { return arc.target() > 0;

Re: [PR] Refactor ByteBlockPool so it is just a "shift/mask big array" [lucene]

2023-10-10 Thread via GitHub
iverase commented on PR #12625: URL: https://github.com/apache/lucene/pull/12625#issuecomment-1754688030 I run wikimediumall and still a bit noisy: ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value

Re: [PR] Move addNode to FSTCompiler [lucene]

2023-10-10 Thread via GitHub
mikemccand commented on PR #12646: URL: https://github.com/apache/lucene/pull/12646#issuecomment-1754655466 > Given how expert this code is and that the relevant methods are all package-private I don't see a problem with backporting this to 9x - what do you think @mikemccand? +1 to b

Re: [PR] Move addNode to FSTCompiler [lucene]

2023-10-10 Thread via GitHub
mikemccand commented on code in PR #12646: URL: https://github.com/apache/lucene/pull/12646#discussion_r1351794651 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -640,381 +602,11 @@ public static boolean targetHasArcs(Arc arc) { return arc.target() > 0;

Re: [PR] Ensure LeafCollector#finish is only called once on the main collector during drill-sideways [lucene]

2023-10-10 Thread via GitHub
jpountz commented on code in PR #12642: URL: https://github.com/apache/lucene/pull/12642#discussion_r1351794343 ## lucene/facet/src/test/org/apache/lucene/facet/TestDrillSideways.java: ## @@ -316,6 +316,58 @@ public void testBasic() throws Exception { IOUtils.close(searcher

Re: [PR] Move addNode to FSTCompiler [lucene]

2023-10-10 Thread via GitHub
romseygeek commented on PR #12646: URL: https://github.com/apache/lucene/pull/12646#issuecomment-1754625497 Thanks for opening @dungba88! This FST building code is very hairy and this is a nice start at cleaning it up. Given how expert this code is and that the relevant methods are al

Re: [PR] Ensure LeafCollector#finish is only called once on the main collector during drill-sideways [lucene]

2023-10-10 Thread via GitHub
gf2121 commented on code in PR #12642: URL: https://github.com/apache/lucene/pull/12642#discussion_r1351726366 ## lucene/facet/src/test/org/apache/lucene/facet/TestDrillSideways.java: ## @@ -1490,7 +1542,22 @@ public List reduce(Collection collectors) { .collect(Coll

Re: [PR] Avoid duplicate array fill in BPIndexReorderer [lucene]

2023-10-10 Thread via GitHub
gf2121 merged PR #12645: URL: https://github.com/apache/lucene/pull/12645 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

[PR] Move addNode to FSTCompiler [lucene]

2023-10-10 Thread via GitHub
dungba88 opened a new pull request, #12646: URL: https://github.com/apache/lucene/pull/12646 ### Description Currently FSTCompiler and FST has a circular dependencies to each other. FSTCompiler creates an instance of FST, and on adding node, it delegates to `FST.addNode()` and passin