[I] normalize() override provided in Simple example in Analyzer class doc is missing String fieldName parameter [lucene]

2023-10-12 Thread via GitHub
Bluetopia opened a new issue, #12666: URL: https://github.com/apache/lucene/issues/12666 ### Description Using: https://javadoc.io/doc/org.apache.lucene/lucene-core/latest/org/apache/lucene/analysis/Analyzer.html which should hopefully be up to date. The current state of the source f

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-12 Thread via GitHub
mayya-sharipova commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1357345297 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsWriter.java: ## @@ -0,0 +1,782 @@ +/* + * Licensed to the Apache Softwa

Re: [PR] Extract the hnsw graph merging from being part of the vector writer [lucene]

2023-10-12 Thread via GitHub
benwtrent commented on code in PR #12657: URL: https://github.com/apache/lucene/pull/12657#discussion_r1357376455 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/IncrementalHnswGraphMerger.java: ## @@ -0,0 +1,209 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-12 Thread via GitHub
benwtrent commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1357380984 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsFormat.java: ## @@ -0,0 +1,209 @@ +/* + * Licensed to the Apache Software Foundation (AS

[PR] migrate all vectorbench methods to lucene [lucene]

2023-10-12 Thread via GitHub
rmuir opened a new pull request, #12667: URL: https://github.com/apache/lucene/pull/12667 Following up to @dweiss work, this gives us the same benchmarks as https://github.com/rmuir/vectorbench, just without the code duplication and maintenance hassle. Each method is simply invoked w

Re: [PR] migrate all vectorbench methods to lucene [lucene]

2023-10-12 Thread via GitHub
rmuir commented on PR #12667: URL: https://github.com/apache/lucene/pull/12667#issuecomment-1760526957 You can still tell what's happening too due to the log messages. When each benchmark runs, you see a single message: xxxScalar() methods: ``` WARNING: Java vector incubator mod

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-10-12 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1353826324 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -21,19 +21,18 @@ import java.util.List; import org.apache.lucene.store.DataInput; import

Re: [PR] Optimize OnHeapHnswGraph's data structure [lucene]

2023-10-12 Thread via GitHub
zhaih commented on code in PR #12651: URL: https://github.com/apache/lucene/pull/12651#discussion_r1357264961 ## lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java: ## @@ -40,31 +41,29 @@ public final class OnHeapHnswGraph extends HnswGraph implements Account

Re: [PR] [DRAFT] Concurrent HNSW Merge [lucene]

2023-10-12 Thread via GitHub
xjtushilei commented on PR #12660: URL: https://github.com/apache/lucene/pull/12660#issuecomment-1760636017 Good Job -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] Create a task executor when executor is not provided [lucene]

2023-10-12 Thread via GitHub
sohami commented on code in PR #12606: URL: https://github.com/apache/lucene/pull/12606#discussion_r1357659905 ## lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java: ## @@ -420,13 +418,12 @@ public int count(Query query) throws IOException { } /** - * Ret

Re: [PR] read MSB VLong in new way [lucene]

2023-10-12 Thread via GitHub
gf2121 commented on PR #12661: URL: https://github.com/apache/lucene/pull/12661#issuecomment-1760704047 It seems i can reproduce the slow down locally with `wikimediumall`, and this patch do not work :( I'll keep digging ``` TaskQPS baseline StdD

Re: [PR] read MSB VLong in new way [lucene]

2023-10-12 Thread via GitHub
gf2121 commented on PR #12661: URL: https://github.com/apache/lucene/pull/12661#issuecomment-1760721571 I made an experiment that run luceneutil without `-r`, which means we read old format with [candidate code](https://github.com/apache/lucene/blob/92c13b8e59c5e4cf6de5e49863e0b102d5063987/

Re: [PR] Extract the hnsw graph merging from being part of the vector writer [lucene]

2023-10-12 Thread via GitHub
jmazanec15 commented on code in PR #12657: URL: https://github.com/apache/lucene/pull/12657#discussion_r1357750648 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/IncrementalHnswGraphMerger.java: ## @@ -0,0 +1,209 @@ +/* + * Licensed to the Apache Software Foundation (

Re: [PR] Extract the hnsw graph merging from being part of the vector writer [lucene]

2023-10-12 Thread via GitHub
zhaih commented on code in PR #12657: URL: https://github.com/apache/lucene/pull/12657#discussion_r1357779437 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/IncrementalHnswGraphMerger.java: ## @@ -0,0 +1,209 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] read MSB VLong in new way [lucene]

2023-10-12 Thread via GitHub
gf2121 commented on PR #12661: URL: https://github.com/apache/lucene/pull/12661#issuecomment-1760938463 2000 PK task repeats per JVM: ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value

Re: [PR] Optimize reading data from postings/impacts enums. [lucene]

2023-10-12 Thread via GitHub
jpountz closed pull request #12664: Optimize reading data from postings/impacts enums. URL: https://github.com/apache/lucene/pull/12664 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] Optimize reading data from postings/impacts enums. [lucene]

2023-10-12 Thread via GitHub
jpountz commented on PR #12664: URL: https://github.com/apache/lucene/pull/12664#issuecomment-1761003837 I'll split this PR into smaller ones, so that it's easier to understand the impact of each change. -- This is an automated message from the Apache Git Service. To respond to the messag

[PR] Fix lazy decoding of frequencies in `BlockImpactsDocsEnum`. [lucene]

2023-10-12 Thread via GitHub
jpountz opened a new pull request, #12668: URL: https://github.com/apache/lucene/pull/12668 The code was written as if frequencies should be lazily decoding, except that when refilling buffers freqs were getting eagerly decoded instead of lazily. -- This is an automated message from the

Re: [I] `FSTCompiler.Builder` should have an option to stream the FST bytes directly to Directory [lucene]

2023-10-13 Thread via GitHub
dungba88 commented on issue #12543: URL: https://github.com/apache/lucene/issues/12543#issuecomment-1761024337 Copied from the PR: It seems Tantivy segregate the building and the traverse of FST as 2 different entity. The FST Builder will just write the FST to a DataOutput and not al

Re: [PR] Fix lazy decoding of frequencies in `BlockImpactsDocsEnum`. [lucene]

2023-10-13 Thread via GitHub
jpountz commented on PR #12668: URL: https://github.com/apache/lucene/pull/12668#issuecomment-1761022817 Results on `wikibigall`: ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value Coun

Re: [PR] Fix lazy decoding of frequencies in `BlockImpactsDocsEnum`. [lucene]

2023-10-13 Thread via GitHub
jpountz commented on PR #12668: URL: https://github.com/apache/lucene/pull/12668#issuecomment-1761035614 For reference this is extracted from #12664. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] read MSB VLong in new way [lucene]

2023-10-13 Thread via GitHub
gf2121 commented on PR #12661: URL: https://github.com/apache/lucene/pull/12661#issuecomment-1761040593 I indeed can not understand how this version of `readMSBVLong` can be slower than `readVLong` when JVM is warmed up enough. I wonder if there could be some higher level reasons causing th

Re: [PR] migrate all vectorbench methods to lucene [lucene]

2023-10-13 Thread via GitHub
uschindler commented on PR #12667: URL: https://github.com/apache/lucene/pull/12667#issuecomment-1761054475 I have not yet checked the benchmakr fraework at all (earlier PR), but the idea here is cool: -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] read MSB VLong in new way [lucene]

2023-10-13 Thread via GitHub
gf2121 commented on PR #12661: URL: https://github.com/apache/lucene/pull/12661#issuecomment-1761076967 Maybe this is the [todo](https://github.com/apache/lucene/blob/06341ffe1dfdd2c263c1ea2fe8da0ba6e719d00f/lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.ja

Re: [PR] Fix lazy decoding of frequencies in `BlockImpactsDocsEnum`. [lucene]

2023-10-13 Thread via GitHub
jpountz merged PR #12668: URL: https://github.com/apache/lucene/pull/12668 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[I] Make OrdinalMap maps docID to global ordinal directly? [lucene]

2023-10-13 Thread via GitHub
vsop-479 opened a new issue, #12669: URL: https://github.com/apache/lucene/issues/12669 ### Description OrdinalMap maps segment ordinal to global ordinal. When collecting, we need read segment ordinal by docID, and get global ordinal by segment ordinal. If OrdinalMap maps docID to

[PR] Specialize `BlockImpactsDocsEnum#nextDoc()`. [lucene]

2023-10-13 Thread via GitHub
jpountz opened a new pull request, #12670: URL: https://github.com/apache/lucene/pull/12670 When we initially introduced support for dynamic pruning, we had an implementation of WAND that would almost exclusively use `advance()`. Now that we switched to MAXSCORE and rely much more on `nextD

Re: [PR] Specialize `BlockImpactsDocsEnum#nextDoc()`. [lucene]

2023-10-13 Thread via GitHub
jpountz commented on PR #12670: URL: https://github.com/apache/lucene/pull/12670#issuecomment-1761288055 Results on wikibigall. Both the baseline and the contender have #12668. ``` TaskQPS baseline StdDevQPS my_modified_version StdDev

Re: [PR] migrate all vectorbench methods to lucene [lucene]

2023-10-13 Thread via GitHub
dweiss commented on code in PR #12667: URL: https://github.com/apache/lucene/pull/12667#discussion_r1358126818 ## help/jmh.txt: ## @@ -0,0 +1,15 @@ +benchmarks +== + +See the README.txt folder in lucene/benchmark-jmh for how to run benchmarks using the build system. Re

Re: [PR] Extract the hnsw graph merging from being part of the vector writer [lucene]

2023-10-13 Thread via GitHub
benwtrent commented on code in PR #12657: URL: https://github.com/apache/lucene/pull/12657#discussion_r1358143088 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/IncrementalHnswGraphMerger.java: ## @@ -0,0 +1,209 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] Create a task executor when executor is not provided [lucene]

2023-10-13 Thread via GitHub
javanna commented on code in PR #12606: URL: https://github.com/apache/lucene/pull/12606#discussion_r1358146590 ## lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java: ## @@ -420,13 +418,12 @@ public int count(Query query) throws IOException { } /** - * Re

Re: [PR] Add interface VectorValues to be implemented by [Float/Byte]VectorValues [lucene]

2023-10-13 Thread via GitHub
shubhamvishu commented on PR #12636: URL: https://github.com/apache/lucene/pull/12636#issuecomment-1761387046 @benwtrent Makes sense to meI'll see if I could address those part of code in this PR without a new abstraction. -- This is an automated message from the Apache Git Service. T

Re: [PR] migrate all vectorbench methods to lucene [lucene]

2023-10-13 Thread via GitHub
rmuir commented on code in PR #12667: URL: https://github.com/apache/lucene/pull/12667#discussion_r1358163111 ## help/jmh.txt: ## @@ -0,0 +1,15 @@ +benchmarks +== + +See the README.txt folder in lucene/benchmark-jmh for how to run benchmarks using the build system. Rev

[PR] Move private static classes or functions out of DoubleValuesSource [lucene]

2023-10-13 Thread via GitHub
shubhamvishu opened a new pull request, #12671: URL: https://github.com/apache/lucene/pull/12671 ### Description This is a followup PR of #12548 Changes in this PR : - Move some of the private static classes or functions etc out of `DoubleValuesSource` class to keep

Re: [PR] Ability to compute vector similarity scores with DoubleValuesSource [lucene]

2023-10-13 Thread via GitHub
shubhamvishu commented on PR #12548: URL: https://github.com/apache/lucene/pull/12548#issuecomment-1761428061 @msokolov @benwtrent Raised a followup PR #12671 for this and also moved some other relevant static classes etc out of DVS(it now looks much better to me). Thanks! -- This is an

Re: [PR] Extract the hnsw graph merging from being part of the vector writer [lucene]

2023-10-13 Thread via GitHub
benwtrent commented on code in PR #12657: URL: https://github.com/apache/lucene/pull/12657#discussion_r1358208212 ## lucene/core/src/java/org/apache/lucene/util/hnsw/InitializedHnswGraphBuilder.java: ## @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

[PR] Speed up TestIndexOrDocValuesQuery. [lucene]

2023-10-13 Thread via GitHub
jpountz opened a new pull request, #12672: URL: https://github.com/apache/lucene/pull/12672 This changes the following: - fewer docs indexed in non-nightly runs, - `QueryUtils#checkFirstSkipTo` uses the `ScorerSupplier` API to convey it will only check one doc, - `QueryUtils#chec

Re: [PR] Speed up TestIndexOrDocValuesQuery. [lucene]

2023-10-13 Thread via GitHub
jpountz commented on PR #12672: URL: https://github.com/apache/lucene/pull/12672#issuecomment-1761457934 On my machine the test goes from ~30s to ~5s. This is probably still too much, but at least these changes should not meaningfully decrease coverage. -- This is an automated message fro

Re: [I] Make OrdinalMap maps docID to global ordinal directly? [lucene]

2023-10-13 Thread via GitHub
jpountz commented on issue #12669: URL: https://github.com/apache/lucene/issues/12669#issuecomment-1761462109 > the global ordinals can not keep monotonic, which may harm to memory. The other downside is that you then need to store `maxDoc` global ordinals (one per doc) instead of `va

Re: [PR] migrate all vectorbench methods to lucene [lucene]

2023-10-13 Thread via GitHub
ChrisHegarty commented on PR #12667: URL: https://github.com/apache/lucene/pull/12667#issuecomment-1761486768 This looks great. I'll checkout the branch and play a little with, then report back. -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [I] TestSizeBoundedForceMerge.testByteSizeLimit test failure [lucene]

2023-10-13 Thread via GitHub
jpountz commented on issue #12648: URL: https://github.com/apache/lucene/issues/12648#issuecomment-1761523732 I found the issue, this is due to codec randomization. This test fails when assigned the `SimpleText` codec and one of the segment IDs has the `\n` or `\` byte. This forces `SimpleT

[PR] Fix TestSizeBoundedForceMerge. [lucene]

2023-10-13 Thread via GitHub
jpountz opened a new pull request, #12673: URL: https://github.com/apache/lucene/pull/12673 This test sometimes fails because `SimpleText` has a non-deterministic size for its segment info file, due to escape characters. The test now enforces the default codec, and checks that segments have

Re: [PR] migrate all vectorbench methods to lucene [lucene]

2023-10-13 Thread via GitHub
rmuir commented on PR #12667: URL: https://github.com/apache/lucene/pull/12667#issuecomment-1761564948 I also removed some of the hacky self-tests the benchmarks were doing (comparing results across different impls). The reason is: when experimenting, you can easily validate the correctness

Re: [PR] migrate all vectorbench methods to lucene [lucene]

2023-10-13 Thread via GitHub
ChrisHegarty commented on code in PR #12667: URL: https://github.com/apache/lucene/pull/12667#discussion_r1358302452 ## lucene/benchmark-jmh/README.txt: ## @@ -15,15 +15,16 @@ java --module-path lucene\benchmark-jmh\build\benchmarks --module org.apache.luc You can pass any J

Re: [PR] migrate all vectorbench methods to lucene [lucene]

2023-10-13 Thread via GitHub
ChrisHegarty commented on code in PR #12667: URL: https://github.com/apache/lucene/pull/12667#discussion_r1358302452 ## lucene/benchmark-jmh/README.txt: ## @@ -15,15 +15,16 @@ java --module-path lucene\benchmark-jmh\build\benchmarks --module org.apache.luc You can pass any J

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-13 Thread via GitHub
benwtrent commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1358319824 ## lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java: ## @@ -0,0 +1,267 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-13 Thread via GitHub
jimczi commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1358281601 ## lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java: ## @@ -0,0 +1,267 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + *

Re: [PR] migrate all vectorbench methods to lucene [lucene]

2023-10-13 Thread via GitHub
rmuir commented on code in PR #12667: URL: https://github.com/apache/lucene/pull/12667#discussion_r1358317343 ## lucene/benchmark-jmh/README.txt: ## @@ -15,15 +15,16 @@ java --module-path lucene\benchmark-jmh\build\benchmarks --module org.apache.luc You can pass any JMH opti

Re: [PR] migrate all vectorbench methods to lucene [lucene]

2023-10-13 Thread via GitHub
rmuir commented on PR #12667: URL: https://github.com/apache/lucene/pull/12667#issuecomment-1761582752 I updated the docs, please have another look. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-13 Thread via GitHub
benwtrent commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1358322632 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsWriter.java: ## @@ -0,0 +1,782 @@ +/* + * Licensed to the Apache Software Fou

Re: [PR] Add javadoc note to LeafCollector#finish [lucene]

2023-10-13 Thread via GitHub
gsmiller merged PR #12643: URL: https://github.com/apache/lucene/pull/12643 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

Re: [PR] Ensure LeafCollector#finish is only called once on the main collector during drill-sideways [lucene]

2023-10-13 Thread via GitHub
gsmiller merged PR #12642: URL: https://github.com/apache/lucene/pull/12642 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

Re: [PR] migrate all vectorbench methods to lucene [lucene]

2023-10-13 Thread via GitHub
uschindler commented on PR #12667: URL: https://github.com/apache/lucene/pull/12667#issuecomment-1761613992 Hi Robert, could you remove this exclusion: https://github.com/dweiss/lucene/blob/e2e1b569fba1593ff4c3381f3494f552db2b4c74/lucene/benchmark-jmh/build.gradle#L37-L38 It is

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-13 Thread via GitHub
jimczi commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1358363771 ## lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java: ## @@ -0,0 +1,267 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + *

Re: [PR] migrate all vectorbench methods to lucene [lucene]

2023-10-13 Thread via GitHub
uschindler commented on PR #12667: URL: https://github.com/apache/lucene/pull/12667#issuecomment-1761642871 > @dweiss is it enough to have RUNTIME_JAVA_HOME set when using Gradle to run benchmark with also preview versions not yet supported by Gradle? I noticed you need to run it on y

Re: [PR] migrate all vectorbench methods to lucene [lucene]

2023-10-13 Thread via GitHub
rmuir commented on PR #12667: URL: https://github.com/apache/lucene/pull/12667#issuecomment-1761663375 > Hi Robert, could you remove this exclusion: https://github.com/dweiss/lucene/blob/e2e1b569fba1593ff4c3381f3494f552db2b4c74/lucene/benchmark-jmh/build.gradle#L37-L38 > > It is no lo

Re: [PR] migrate all vectorbench methods to lucene [lucene]

2023-10-13 Thread via GitHub
uschindler commented on PR #12667: URL: https://github.com/apache/lucene/pull/12667#issuecomment-1761668041 > > Hi Robert, could you remove this exclusion: https://github.com/dweiss/lucene/blob/e2e1b569fba1593ff4c3381f3494f552db2b4c74/lucene/benchmark-jmh/build.gradle#L37-L38 > > It is no

Re: [PR] migrate all vectorbench methods to lucene [lucene]

2023-10-13 Thread via GitHub
uschindler commented on PR #12667: URL: https://github.com/apache/lucene/pull/12667#issuecomment-1761682258 Sorry for the chaotic comments, the `failOnMissingClasses = false` line can be removed. I checked it with roberts branch: ``` > Task :lucene:benchmark-jmh:forbiddenApisMai

Re: [PR] migrate all vectorbench methods to lucene [lucene]

2023-10-13 Thread via GitHub
uschindler commented on PR #12667: URL: https://github.com/apache/lucene/pull/12667#issuecomment-1761684636 I just committed the change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] migrate all vectorbench methods to lucene [lucene]

2023-10-13 Thread via GitHub
dweiss commented on PR #12667: URL: https://github.com/apache/lucene/pull/12667#issuecomment-1761712726 > ./gradlew :lucene:benchmark-jmh:run, which runs with the main gradle or alternative JDK as configured in Gradle. I mentioned this on the original issue - I don't think forking a J

Re: [PR] migrate all vectorbench methods to lucene [lucene]

2023-10-13 Thread via GitHub
uschindler commented on PR #12667: URL: https://github.com/apache/lucene/pull/12667#issuecomment-1761713255 Let me fix... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] migrate all vectorbench methods to lucene [lucene]

2023-10-13 Thread via GitHub
dweiss commented on code in PR #12667: URL: https://github.com/apache/lucene/pull/12667#discussion_r1358421683 ## help/jmh.txt: ## @@ -0,0 +1,15 @@ +benchmarks +== + +See the README.txt folder in lucene/benchmark-jmh for how to run benchmarks using the build system. Re

Re: [PR] migrate all vectorbench methods to lucene [lucene]

2023-10-13 Thread via GitHub
uschindler commented on PR #12667: URL: https://github.com/apache/lucene/pull/12667#issuecomment-1761718313 Fixed. If you start the benchamrk now in module mode with the command line provided by "assemble" it works correct: ``` > Task :lucene:benchmark-jmh:assemble JMH bench

Re: [PR] migrate all vectorbench methods to lucene [lucene]

2023-10-13 Thread via GitHub
dweiss commented on PR #12667: URL: https://github.com/apache/lucene/pull/12667#issuecomment-1761724093 > is it enough to have RUNTIME_JAVA_HOME set when using Gradle to run benchmark with also preview versions not yet supported by Gradle? The code will compile with the provided RUNTI

Re: [PR] migrate all vectorbench methods to lucene [lucene]

2023-10-13 Thread via GitHub
dweiss commented on PR #12667: URL: https://github.com/apache/lucene/pull/12667#issuecomment-1761727033 > The jdk.incubator.vector must be removed from the module-info.java file. This was intentional, otherwise the example benchmark wouldn't compile... But since the vectors-specific c

Re: [PR] migrate all vectorbench methods to lucene [lucene]

2023-10-13 Thread via GitHub
uschindler commented on PR #12667: URL: https://github.com/apache/lucene/pull/12667#issuecomment-1761741178 > > The jdk.incubator.vector must be removed from the module-info.java file. > > This was intentional, otherwise the example benchmark wouldn't compile... But since the vectors-

Re: [PR] Fix TestSizeBoundedForceMerge. [lucene]

2023-10-13 Thread via GitHub
jpountz merged PR #12673: URL: https://github.com/apache/lucene/pull/12673 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [I] TestSizeBoundedForceMerge.testByteSizeLimit test failure [lucene]

2023-10-13 Thread via GitHub
jpountz closed issue #12648: TestSizeBoundedForceMerge.testByteSizeLimit test failure URL: https://github.com/apache/lucene/issues/12648 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Refactor ByteBlockPool so it is just a "shift/mask big array" [lucene]

2023-10-13 Thread via GitHub
iverase commented on code in PR #12625: URL: https://github.com/apache/lucene/pull/12625#discussion_r1358473230 ## lucene/core/src/java/org/apache/lucene/index/TermsHashPerField.java: ## @@ -255,6 +255,81 @@ final void writeBytes(int stream, byte[] b, int offset, int len) {

[PR] LUCENE-10241: upgrade to OpenNLP 2.3.0 [lucene]

2023-10-13 Thread via GitHub
cpoerschke opened a new pull request, #12674: URL: https://github.com/apache/lucene/pull/12674 ### Description Building upon the #448 pull request upgrading to 1.9.4 -- this could be follow-up or alternative. #11277 -- This is an automated message from the Apache Git Service

Re: [PR] LUCENE-10241: Updating OpenNLP to 1.9.4. [lucene]

2023-10-13 Thread via GitHub
cpoerschke commented on PR #448: URL: https://github.com/apache/lucene/pull/448#issuecomment-1761792386 > > @jzonthemtn not sure I have the knowledge or chops to do this upgrade... > > I'll push an update! I'm also interested in this -- #12674 so far, work-in-progress. -- Thi

Re: [PR] migrate all vectorbench methods to lucene [lucene]

2023-10-13 Thread via GitHub
rmuir commented on PR #12667: URL: https://github.com/apache/lucene/pull/12667#issuecomment-1761796246 Yeah, i think its better not to compile benchmarks with incubator/preview etc? I think benchmarks should just call lucene methods, and lucene code should do the right thing with handling i

[I] MultiSimilarity.MultiSimScorer should sum up scores into a double [lucene]

2023-10-13 Thread via GitHub
jpountz opened a new issue, #12675: URL: https://github.com/apache/lucene/issues/12675 ### Description We generally prefer scores to not depend on the order of clauses, e.g. BM25's multi-term sim scorer, disjunctive and conjunctive queries all sum up scores into a double to reduce ac

Re: [PR] LUCENE-10241: upgrade to OpenNLP 2.3.0 [lucene]

2023-10-13 Thread via GitHub
cpoerschke commented on code in PR #12674: URL: https://github.com/apache/lucene/pull/12674#discussion_r1358513093 ## lucene/analysis/opennlp/src/test/org/apache/lucene/analysis/opennlp/TestOpenNLPChunkerFilterFactory.java: ## @@ -58,7 +58,7 @@ public class TestOpenNLPChunkerFil

Re: [PR] LUCENE-10241: upgrade to OpenNLP 2.3.0 [lucene]

2023-10-13 Thread via GitHub
cpoerschke commented on code in PR #12674: URL: https://github.com/apache/lucene/pull/12674#discussion_r1358515284 ## lucene/licenses/opennlp-tools-NOTICE.txt: ## @@ -1,6 +1,11 @@ - -Apache OpenNLP Tools -Copyright 2015 The Apache Software Foundation +Apache OpenNLP +Copyright 2

Re: [PR] LUCENE-10241: upgrade to OpenNLP 2.3.0 [lucene]

2023-10-13 Thread via GitHub
cpoerschke commented on code in PR #12674: URL: https://github.com/apache/lucene/pull/12674#discussion_r1358579503 ## lucene/analysis/opennlp/src/tools/test-model-data/README.txt: ## @@ -3,4 +3,4 @@ Training data derived from Reuters corpus in very unscientific way. Tagging do

Re: [PR] migrate all vectorbench methods to lucene [lucene]

2023-10-13 Thread via GitHub
uschindler commented on PR #12667: URL: https://github.com/apache/lucene/pull/12667#issuecomment-1761940947 I ran all benchmarks in module mode (second line of assemble output) on my AVX-256 laptop: Prozessor: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz, 1992 MHz, 4 Kern(e), 8 logisch

Re: [PR] migrate all vectorbench methods to lucene [lucene]

2023-10-13 Thread via GitHub
uschindler commented on PR #12667: URL: https://github.com/apache/lucene/pull/12667#issuecomment-1761946068 > Yeah, i think its better not to compile benchmarks with incubator/preview etc? I think benchmarks should just call lucene methods, and lucene code should do the right thing with han

Re: [PR] migrate all vectorbench methods to lucene [lucene]

2023-10-13 Thread via GitHub
dweiss commented on PR #12667: URL: https://github.com/apache/lucene/pull/12667#issuecomment-1762019283 > The problem was: if you run the benchmark in module mode, the requirement added the module without respecting if the extra runtime parameter was given. Yep, I understand. Did it a

Re: [PR] Speed up TestIndexOrDocValuesQuery. [lucene]

2023-10-13 Thread via GitHub
rmuir commented on PR #12672: URL: https://github.com/apache/lucene/pull/12672#issuecomment-1762048057 Thanks for improving the speed of this test, last time i ran tests it took over 200seconds on my computer, i just did not look into it. -- This is an automated message from the Apache Gi

Re: [PR] Speedup integer functions for 128-bit neon vectors [lucene]

2023-10-13 Thread via GitHub
rmuir commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1762051354 oh @gf2121 I missed that you added this, thank you! I am looking at it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Speedup integer functions for 128-bit neon vectors [lucene]

2023-10-13 Thread via GitHub
rmuir commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1762055897 @gf2121 @ChrisHegarty you can see the issue from his assembler output with the failed intel optimization: the current code does 2 x 256-bit vpmull on ymm registers, the proposed simplif

Re: [PR] Speedup integer functions for 128-bit neon vectors [lucene]

2023-10-13 Thread via GitHub
rmuir commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1762063378 @gf2121 i think we could diagnose it further with https://github.com/travisdowns/avx-turbo -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] Speedup integer functions for 128-bit neon vectors [lucene]

2023-10-13 Thread via GitHub
rmuir commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1762068277 I compiled the code and ran it easily, just `git clone + make`. You do have to run it as root to get the useful output, I took a risk on my machine: ``` think:avx-turbo[master]$ sudo

[PR] Improve logging for vector support [lucene]

2023-10-13 Thread via GitHub
uschindler opened a new pull request, #12676: URL: https://github.com/apache/lucene/pull/12676 This improves logging a bit: - if java version is too old, but user explicitely enabled the incubator module - inform user about missing integer support Robert and I hit this when tets

Re: [PR] Speedup integer functions for 128-bit neon vectors [lucene]

2023-10-13 Thread via GitHub
rmuir commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1762073195 I guess you will have to probably `modprobe msr` first. I already have the `msr` module loaded for other nefarious purposes. -- This is an automated message from the Apache Git Service.

Re: [PR] Speedup integer functions for 128-bit neon vectors [lucene]

2023-10-13 Thread via GitHub
rmuir commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1762084009 And at least the theory makes sense, this integer multiply is definitely "avx512 heavy", so if u have a cpu susceptible to throttling, better to do 256bit multiplies that we do today. I gu

Re: [PR] migrate all vectorbench methods to lucene [lucene]

2023-10-13 Thread via GitHub
uschindler commented on PR #12667: URL: https://github.com/apache/lucene/pull/12667#issuecomment-1762118934 > Did it also happen when it was started with -jar ("classpath" mode)? I don't think it should, right? Naaah, that was the most confusing part. -- This is an automated messag

Re: [PR] Extract the hnsw graph merging from being part of the vector writer [lucene]

2023-10-13 Thread via GitHub
benwtrent commented on PR #12657: URL: https://github.com/apache/lucene/pull/12657#issuecomment-1762131240 Just being paranoid, I tested and verified that recall is absolutely unchanged between these changes. baseline: ``` 0.500 0.10 10 10 4 50 20

Re: [PR] Improve logging for vector support [lucene]

2023-10-13 Thread via GitHub
uschindler merged PR #12676: URL: https://github.com/apache/lucene/pull/12676 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] migrate all vectorbench methods to lucene [lucene]

2023-10-13 Thread via GitHub
uschindler commented on PR #12667: URL: https://github.com/apache/lucene/pull/12667#issuecomment-1762144670 @rmuir I merged main barnch here, so you can retest the better logging of #12676 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] Extract the hnsw graph merging from being part of the vector writer [lucene]

2023-10-13 Thread via GitHub
benwtrent commented on code in PR #12657: URL: https://github.com/apache/lucene/pull/12657#discussion_r1358794152 ## lucene/core/src/test/org/apache/lucene/util/hnsw/HnswGraphTestCase.java: ## @@ -1147,15 +1139,15 @@ void assertVectorsEqual(AbstractMockVectorValues u, AbstractM

Re: [PR] migrate all vectorbench methods to lucene [lucene]

2023-10-13 Thread via GitHub
rmuir commented on PR #12667: URL: https://github.com/apache/lucene/pull/12667#issuecomment-1762210828 The logging works correctly for my use-case (accidentally running java 17). Scalar method has no warnings, Vector method prints: `WARNING: Java vector incubator module was enabled by comma

Re: [PR] Better detect vector module in non-default setups (e.g., custom module layers) [lucene]

2023-10-13 Thread via GitHub
uschindler commented on code in PR #12677: URL: https://github.com/apache/lucene/pull/12677#discussion_r1358910134 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java: ## @@ -120,24 +122,22 @@ static VectorizationProvider lookup(boolean te

[PR] Fix unstable test TestVectorSimilarityValuesSource [lucene]

2023-10-13 Thread via GitHub
zhaih opened a new pull request, #12678: URL: https://github.com/apache/lucene/pull/12678 ### Description This fix failure test: https://jenkins.thetaphi.de/job/Lucene-main-Windows/13300/ The cause is the writer is not committed nor force merged, but the test is depending on t

Re: [PR] Better detect vector module in non-default setups (e.g., custom module layers) [lucene]

2023-10-13 Thread via GitHub
dweiss commented on code in PR #12677: URL: https://github.com/apache/lucene/pull/12677#discussion_r1359203222 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java: ## @@ -120,24 +122,22 @@ static VectorizationProvider lookup(boolean testMo

Re: [PR] Better detect vector module in non-default setups (e.g., custom module layers) [lucene]

2023-10-13 Thread via GitHub
uschindler commented on code in PR #12677: URL: https://github.com/apache/lucene/pull/12677#discussion_r1359204137 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java: ## @@ -120,24 +122,22 @@ static VectorizationProvider lookup(boolean te

Re: [PR] Better detect vector module in non-default setups (e.g., custom module layers) [lucene]

2023-10-13 Thread via GitHub
uschindler commented on code in PR #12677: URL: https://github.com/apache/lucene/pull/12677#discussion_r1359204137 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java: ## @@ -120,24 +122,22 @@ static VectorizationProvider lookup(boolean te

Re: [PR] Better detect vector module in non-default setups (e.g., custom module layers) [lucene]

2023-10-13 Thread via GitHub
uschindler commented on code in PR #12677: URL: https://github.com/apache/lucene/pull/12677#discussion_r1359204492 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java: ## @@ -120,24 +122,22 @@ static VectorizationProvider lookup(boolean te

Re: [PR] Better detect vector module in non-default setups (e.g., custom module layers) [lucene]

2023-10-13 Thread via GitHub
uschindler commented on code in PR #12677: URL: https://github.com/apache/lucene/pull/12677#discussion_r1359204862 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java: ## @@ -120,24 +122,22 @@ static VectorizationProvider lookup(boolean te

<    5   6   7   8   9   10   11   12   13   14   >