Re: [I] Upgrade to Gradle 8.4 [lucene]

2023-10-12 Thread via GitHub
dweiss commented on issue #12655: URL: https://github.com/apache/lucene/issues/12655#issuecomment-1759046099 So it seems something has changed and the javaCompiler property is now always non-null, causing -X options to be passed to the forked compiler (instead of -J...). I've changed this c

Re: [I] Increase the number of dims for KNN vectors to 2048 [LUCENE-10471] [lucene]

2023-10-12 Thread via GitHub
MarcusSorealheis commented on issue #11507: URL: https://github.com/apache/lucene/issues/11507#issuecomment-1759050979 I think we should close it for sure. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]

2023-10-12 Thread via GitHub
vsop-479 commented on PR #11888: URL: https://github.com/apache/lucene/pull/11888#issuecomment-1759050886 Append some performance data. Note that the results have quite diversity from different rounds. # round1 Task QPS baseline StdDev QPS bsearc

Re: [I] Upgrade to Gradle 8.4 [lucene]

2023-10-12 Thread via GitHub
uschindler commented on issue #12655: URL: https://github.com/apache/lucene/issues/12655#issuecomment-1759069434 I don't remember any problem there. The toolchain use for `generateJdkApiJarXX` was added much later and I did not touch that. Maybe check git blame. -- This is an automated m

Re: [I] Upgrade to Gradle 8.4 [lucene]

2023-10-12 Thread via GitHub
dweiss commented on issue #12655: URL: https://github.com/apache/lucene/issues/12655#issuecomment-1759073292 I did check that file's history - it used an explicit exclusion for compileMain19Java: ``` if (task.path == ":lucene:core:compileMain19Java") { // uses "usc

Re: [I] Nightly benchmark regression for term dict queries [lucene]

2023-10-12 Thread via GitHub
gf2121 commented on issue #12659: URL: https://github.com/apache/lucene/issues/12659#issuecomment-1759141669 I write a JMH benchmark to compare reading `vint` vs `msbvint`. ``` Benchmark (size) (valueBit) Mode Cnt Score Error Units ReadVLongBenchmar

Re: [I] Upgrade to Gradle 8.4 [lucene]

2023-10-12 Thread via GitHub
uschindler commented on issue #12655: URL: https://github.com/apache/lucene/issues/12655#issuecomment-1759182112 Hi, there are some other problem with the regenerate task, toolchains do not automatically download, although enabled. I am digging. -- This is an automated message from the Ap

Re: [I] Upgrade to Gradle 8.4 [lucene]

2023-10-12 Thread via GitHub
uschindler commented on issue #12655: URL: https://github.com/apache/lucene/issues/12655#issuecomment-1759187489 Very strange output if I clean the JDK cache in `~/.gradle/jdks`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] Upgrade to Gradle 8.4 [lucene]

2023-10-12 Thread via GitHub
uschindler commented on issue #12655: URL: https://github.com/apache/lucene/issues/12655#issuecomment-1759191108 Task info by `gradlew tasks` is correct: generateJdkApiJar19 - Regenerate the API-only JAR file with public Panama Foreign & Vector API from JDK 19 generateJdkApiJar20 -

Re: [I] Upgrade to Gradle 8.4 [lucene]

2023-10-12 Thread via GitHub
uschindler commented on issue #12655: URL: https://github.com/apache/lucene/issues/12655#issuecomment-1759203200 > I did check that file's history - it used an explicit exclusion for compileMain19Java: > > ``` > if (task.path == ":lucene:core:compileMain19Java") { >

Re: [I] Nightly benchmark regression for term dict queries [lucene]

2023-10-12 Thread via GitHub
gf2121 commented on issue #12659: URL: https://github.com/apache/lucene/issues/12659#issuecomment-1759244758 I tried another reading way like: ``` byte b = msbBytes[pos++]; long i = b & 0x7FL; while (b < 0) { b = msbBytes[pos++]; i = (i << 7) | (b & 0x7FL); } ```

[PR] read MSB VLong in new way [lucene]

2023-10-12 Thread via GitHub
gf2121 opened a new pull request, #12661: URL: https://github.com/apache/lucene/pull/12661 ISSUE: https://github.com/apache/lucene/issues/12659 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] Nightly benchmark regression for term dict queries [lucene]

2023-10-12 Thread via GitHub
gf2121 commented on issue #12659: URL: https://github.com/apache/lucene/issues/12659#issuecomment-1759262343 Another possible reason I'm thinking is that maybe `readMSBVLong` is not as hot as `readVLong` so compiler is not optimizing it enough, or the `readbyte` in statistic context may con

Re: [I] Upgrade to Gradle 8.4 [lucene]

2023-10-12 Thread via GitHub
uschindler commented on issue #12655: URL: https://github.com/apache/lucene/issues/12655#issuecomment-1759281805 OK, I found the bug for the icorrect logger output, but that's not the issue here. The fix is easy. I will provide a PR for the regenerator. The reason is that it looks lik

Re: [I] Upgrade to Gradle 8.4 [lucene]

2023-10-12 Thread via GitHub
uschindler commented on issue #12655: URL: https://github.com/apache/lucene/issues/12655#issuecomment-1759287508 This fixes the logging problem: ```patch diff --git a/gradle/generation/extract-jdk-apis.gradle b/gradle/generation/extract-jdk-apis.gradle index 28f56ea6432..020dc05

Re: [I] Upgrade to Gradle 8.4 [lucene]

2023-10-12 Thread via GitHub
uschindler commented on issue #12655: URL: https://github.com/apache/lucene/issues/12655#issuecomment-1759309453 It looks like gradle toolchain download is fully broken in Gradle 8.4. It always says that it cannot find any JDK, no matter which OS (Windows, Linux). -- This is an automated

[PR] Fix Gradle toolchain download with Gradle 8.4 (#12655) [lucene]

2023-10-12 Thread via GitHub
uschindler opened a new pull request, #12662: URL: https://github.com/apache/lucene/pull/12662 Fix the toolchain support when using 8.4: - Logging of failures is wrong as in Groovy the for loop varaibale is not final, wo when it is used in closure which is executed later you get the l

Re: [I] Upgrade to Gradle 8.4 [lucene]

2023-10-12 Thread via GitHub
uschindler commented on issue #12655: URL: https://github.com/apache/lucene/issues/12655#issuecomment-1759316894 Here is the draft PR, it only fixes the logging, but I have no idea why the toolchains are not downloaded. If you clean `~/.gradle/jdks` and then run `gradlew :lucene:core:regene

[PR] Initial take at adding JMH micro-benchmarks [lucene]

2023-10-12 Thread via GitHub
dweiss opened a new pull request, #12663: URL: https://github.com/apache/lucene/pull/12663 As per discussion in: https://github.com/apache/lucene/issues/12641 This patch adds the ability to compile JMH microbenchmarks and run them from within Lucene codebase. I didn't use any plugins

Re: [I] ability to run JMH benchmarks from gradle [lucene]

2023-10-12 Thread via GitHub
dweiss commented on issue #12641: URL: https://github.com/apache/lucene/issues/12641#issuecomment-1759370475 I filed an initial take on this here: https://github.com/apache/lucene/pull/12663 Copied just one of your existing benchmarks, Rob. Try if it fits your needs. I mentioned on

Re: [I] Upgrade to Gradle 8.4 [lucene]

2023-10-12 Thread via GitHub
uschindler commented on issue #12655: URL: https://github.com/apache/lucene/issues/12655#issuecomment-1759372394 I found the issue, in Gradle 8+ you need to explicitely configure toolchain repositories: > [Using automatic toolchain downloading without having a repository configured](

Re: [I] Upgrade to Gradle 8.4 [lucene]

2023-10-12 Thread via GitHub
dweiss commented on issue #12655: URL: https://github.com/apache/lucene/issues/12655#issuecomment-1759389210 > because the loop variable is not final (like in Java) In groovy closures do have access to their full context (including local variables outside). That code used gstring, whi

Re: [I] Upgrade to Gradle 8.4 [lucene]

2023-10-12 Thread via GitHub
uschindler commented on issue #12655: URL: https://github.com/apache/lucene/issues/12655#issuecomment-1759397039 OK, PR is ready. The plugin needs to be in settings.gradle (it is a so-called "settings" plugin, which works on gradle toplevel without project context. PR: #12662 (please

Re: [I] Upgrade to Gradle 8.4 [lucene]

2023-10-12 Thread via GitHub
uschindler commented on issue #12655: URL: https://github.com/apache/lucene/issues/12655#issuecomment-1759401511 > > because the loop variable is not final (like in Java) > > In groovy closures do have access to their full context (including local variables outside). That code used gs

Re: [PR] Fix Gradle toolchain download with Gradle 8.4 (#12655) [lucene]

2023-10-12 Thread via GitHub
uschindler merged PR #12662: URL: https://github.com/apache/lucene/pull/12662 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [I] Upgrade to Gradle 8.4 [lucene]

2023-10-12 Thread via GitHub
uschindler closed issue #12655: Upgrade to Gradle 8.4 URL: https://github.com/apache/lucene/issues/12655 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] Extract the hnsw graph merging from being part of the vector writer [lucene]

2023-10-12 Thread via GitHub
benwtrent commented on code in PR #12657: URL: https://github.com/apache/lucene/pull/12657#discussion_r1356732427 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/IncrementalHnswGraphMerger.java: ## @@ -0,0 +1,226 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] Extract the hnsw graph merging from being part of the vector writer [lucene]

2023-10-12 Thread via GitHub
benwtrent commented on code in PR #12657: URL: https://github.com/apache/lucene/pull/12657#discussion_r1356733186 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/IncrementalHnswGraphMerger.java: ## @@ -0,0 +1,226 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] Extract the hnsw graph merging from being part of the vector writer [lucene]

2023-10-12 Thread via GitHub
benwtrent commented on code in PR #12657: URL: https://github.com/apache/lucene/pull/12657#discussion_r1356734118 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/IncrementalHnswGraphMerger.java: ## @@ -0,0 +1,226 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] Optimize reading data from postings/impacts enums. [lucene]

2023-10-12 Thread via GitHub
jpountz commented on PR #12664: URL: https://github.com/apache/lucene/pull/12664#issuecomment-1759536405 `luceneutil` on `wikibigall` gave good results, better than I expected: ``` TaskQPS baseline StdDevQPS my_modified_version StdDev

Re: [PR] Optimize reading data from postings/impacts enums. [lucene]

2023-10-12 Thread via GitHub
jpountz commented on code in PR #12664: URL: https://github.com/apache/lucene/pull/12664#discussion_r1356764107 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90PostingsReader.java: ## @@ -1143,14 +1151,16 @@ private void refillDocs() throws IOException {

Re: [I] Upgrade to Gradle 8.4 [lucene]

2023-10-12 Thread via GitHub
risdenk commented on issue #12655: URL: https://github.com/apache/lucene/issues/12655#issuecomment-1759556593 Thanks @uschindler and @dweiss sorry for the work here :( I'll try to make sure I test all these cases next time. -- This is an automated message from the Apache Git Service. To r

Re: [PR] Optimize reading data from postings/impacts enums. [lucene]

2023-10-12 Thread via GitHub
jpountz commented on code in PR #12664: URL: https://github.com/apache/lucene/pull/12664#discussion_r1356769495 ## lucene/core/src/java/org/apache/lucene/codecs/MultiLevelSkipListReader.java: ## @@ -113,13 +113,14 @@ public int skipTo(int target) throws IOException { // w

Re: [PR] Initial take at adding JMH micro-benchmarks [lucene]

2023-10-12 Thread via GitHub
rmuir commented on PR #12663: URL: https://github.com/apache/lucene/pull/12663#issuecomment-1759558774 @dweiss thank you for this! I tried it out and it works great. Glad to see the `cosineDistanceVectorUtil()` working in this example which is exactly what I want to do. -- This is an aut

Re: [PR] Optimize reading data from postings/impacts enums. [lucene]

2023-10-12 Thread via GitHub
jpountz commented on PR #12664: URL: https://github.com/apache/lucene/pull/12664#issuecomment-1759583212 I tried to re-inline advance() but it still suggested a small slowdown for `HighTermDayOfYearSort` and `CountAndHighMed` too so I suspect that there is something else. There is indeed on

Re: [PR] Optimize reading data from postings/impacts enums. [lucene]

2023-10-12 Thread via GitHub
jpountz commented on PR #12664: URL: https://github.com/apache/lucene/pull/12664#issuecomment-1759592840 Another potential reason for the slowdowns is that they're caused by the changes to `MultiLevelSkipListReader`. Removing the additional checks would require changing the way we encode po

Re: [I] Increase the number of dims for KNN vectors to 2048 [LUCENE-10471] [lucene]

2023-10-12 Thread via GitHub
mayya-sharipova commented on issue #11507: URL: https://github.com/apache/lucene/issues/11507#issuecomment-1759594047 Yes, thanks for the reminder. Now Codec is responsible for managing dims, we can close it. -- This is an automated message from the Apache Git Service. To respond to the m

Re: [I] Increase the number of dims for KNN vectors to 2048 [LUCENE-10471] [lucene]

2023-10-12 Thread via GitHub
mayya-sharipova closed issue #11507: Increase the number of dims for KNN vectors to 2048 [LUCENE-10471] URL: https://github.com/apache/lucene/issues/11507 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] Initial take at adding JMH micro-benchmarks [lucene]

2023-10-12 Thread via GitHub
dweiss commented on PR #12663: URL: https://github.com/apache/lucene/pull/12663#issuecomment-1759605550 Let me review the minor remaining bits - I had to leave the computer before the tests completed. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Initial take at adding JMH micro-benchmarks [lucene]

2023-10-12 Thread via GitHub
benwtrent commented on PR #12663: URL: https://github.com/apache/lucene/pull/12663#issuecomment-1759607611 @dweiss this is awesome!!! Thank you so much! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Extract the hnsw graph merging from being part of the vector writer [lucene]

2023-10-12 Thread via GitHub
benwtrent commented on code in PR #12657: URL: https://github.com/apache/lucene/pull/12657#discussion_r1356818110 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/IncrementalHnswGraphMerger.java: ## @@ -0,0 +1,226 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] Avoid use docsSeen in BKDWriter [lucene]

2023-10-12 Thread via GitHub
jpountz commented on code in PR #12658: URL: https://github.com/apache/lucene/pull/12658#discussion_r1356844946 ## lucene/core/src/java/org/apache/lucene/codecs/PointsWriter.java: ## @@ -202,7 +202,7 @@ public long size() { @Override public int ge

Re: [I] Upgrade to Gradle 8.4 [lucene]

2023-10-12 Thread via GitHub
dweiss commented on issue #12655: URL: https://github.com/apache/lucene/issues/12655#issuecomment-1759671663 > By using a each closure the jdkVersion is private to the closure and the build-time evaluation has its own jdkVersion variable (which is a constant). Ah, thanks for clarifyin

Re: [I] Upgrade to Gradle 8.4 [lucene]

2023-10-12 Thread via GitHub
dweiss commented on issue #12655: URL: https://github.com/apache/lucene/issues/12655#issuecomment-1759674565 No worries, @risdenk - it's a big effort already, there'll always be problems, especially with major version upgrades. -- This is an automated message from the Apache Git Service.

Re: [PR] Explicitly return needStats flag in TermStates [lucene]

2023-10-12 Thread via GitHub
jpountz commented on PR #12638: URL: https://github.com/apache/lucene/pull/12638#issuecomment-1759681148 @yugushihuang Your explanation suggests that a `TermStates` could be created somewhere and then used in a different context. But this is not how it's expected to be used in general, the

Re: [PR] Ability to compute vector similarity scores with DoubleValuesSource [lucene]

2023-10-12 Thread via GitHub
shubhamvishu commented on PR #12548: URL: https://github.com/apache/lucene/pull/12548#issuecomment-1759689286 Thanks for reviewing! @benwtrent I have added a CHANGES.txt entry now, could you help me merging this if all looks good? -- This is an automated message from the Apache Gi

Re: [PR] Ability to compute vector similarity scores with DoubleValuesSource [lucene]

2023-10-12 Thread via GitHub
benwtrent commented on PR #12548: URL: https://github.com/apache/lucene/pull/12548#issuecomment-1759701427 @shubhamvishu running CI :). I will see about merging and such soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] Initial take at adding JMH micro-benchmarks [lucene]

2023-10-12 Thread via GitHub
benwtrent commented on PR #12663: URL: https://github.com/apache/lucene/pull/12663#issuecomment-1759752950 @dweiss one suggestion I have is adding a `README` for `lucene/benchmark-jmh` on how to run the benchmarks and add new ones. Your description on how to use it in this PR seems adequate

Re: [PR] Initial take at adding JMH micro-benchmarks [lucene]

2023-10-12 Thread via GitHub
dweiss commented on PR #12663: URL: https://github.com/apache/lucene/pull/12663#issuecomment-1759758762 I enabled forbiddenapis to the extent possible, to make Uwe happier. ;) I like those flame-graphs that Solr has, etc., but these can come later. I'll add a simple ```gradlew :helpJmh``` t

Re: [PR] Cleanup flushing logic in DocumentsWriter [lucene]

2023-10-12 Thread via GitHub
s1monw merged PR #12647: URL: https://github.com/apache/lucene/pull/12647 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] Initial take at adding JMH micro-benchmarks [lucene]

2023-10-12 Thread via GitHub
dweiss commented on PR #12663: URL: https://github.com/apache/lucene/pull/12663#issuecomment-1759777296 @benwtrent Can you take a look? If there's any wording you think would work better, please do change it. Explaining how JMH works is perhaps beyond the scope of the readme file... ;) --

Re: [PR] Initial take at adding JMH micro-benchmarks [lucene]

2023-10-12 Thread via GitHub
benwtrent commented on PR #12663: URL: https://github.com/apache/lucene/pull/12663#issuecomment-1759795868 Looks good @dweiss I added a line with an example for running a single benchmark or suite of benchmarks. -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] Initial take at adding JMH micro-benchmarks [lucene]

2023-10-12 Thread via GitHub
benwtrent commented on code in PR #12663: URL: https://github.com/apache/lucene/pull/12663#discussion_r1356970803 ## lucene/benchmark-jmh/README.txt: ## @@ -0,0 +1,21 @@ +The :lucene:benchmark-jmh module contains can be used to compile +and execute JMH (https://github.com/openjd

Re: [PR] Add interface VectorValues to be implemented by [Float/Byte]VectorValues [lucene]

2023-10-12 Thread via GitHub
shubhamvishu commented on PR #12636: URL: https://github.com/apache/lucene/pull/12636#issuecomment-1759814356 @benwtrent Do you mean we earlier had similar implementation or such interface? > we decided to switch it as a common interface required either Which interface are you

Re: [PR] Ability to compute vector similarity scores with DoubleValuesSource [lucene]

2023-10-12 Thread via GitHub
msokolov commented on code in PR #12548: URL: https://github.com/apache/lucene/pull/12548#discussion_r1357027508 ## lucene/core/src/java/org/apache/lucene/search/DoubleValuesSource.java: ## @@ -172,6 +173,52 @@ public LongValuesSource rewrite(IndexSearcher searcher) throws IOEx

Re: [PR] Ability to compute vector similarity scores with DoubleValuesSource [lucene]

2023-10-12 Thread via GitHub
msokolov commented on PR #12548: URL: https://github.com/apache/lucene/pull/12548#issuecomment-1759887696 oh I missed that @benwtrent had approved -- it's OK w/me to merge as-is, just had some idea about organizing the static methods into a different class... -- This is an automated messa

Re: [PR] Avoid use docsSeen in BKDWriter [lucene]

2023-10-12 Thread via GitHub
easyice commented on code in PR #12658: URL: https://github.com/apache/lucene/pull/12658#discussion_r1357032536 ## lucene/core/src/java/org/apache/lucene/util/bkd/BKDWriter.java: ## @@ -1248,7 +1259,8 @@ private void writeIndex( metaOut.writeBytes(maxPackedValue, 0, config.

Re: [PR] Ability to compute vector similarity scores with DoubleValuesSource [lucene]

2023-10-12 Thread via GitHub
shubhamvishu commented on PR #12548: URL: https://github.com/apache/lucene/pull/12548#issuecomment-1759892123 > oh I missed that @benwtrent had approved -- it's OK w/me to merge as-is, just had some idea about organizing the static methods into a different class... Thanks @msokolov !

Re: [PR] Avoid use docsSeen in BKDWriter [lucene]

2023-10-12 Thread via GitHub
easyice commented on code in PR #12658: URL: https://github.com/apache/lucene/pull/12658#discussion_r1357049989 ## lucene/core/src/java/org/apache/lucene/codecs/PointsWriter.java: ## @@ -202,7 +202,7 @@ public long size() { @Override public int ge

Re: [PR] Ability to compute vector similarity scores with DoubleValuesSource [lucene]

2023-10-12 Thread via GitHub
shubhamvishu commented on code in PR #12548: URL: https://github.com/apache/lucene/pull/12548#discussion_r1357050474 ## lucene/core/src/java/org/apache/lucene/search/DoubleValuesSource.java: ## @@ -172,6 +173,52 @@ public LongValuesSource rewrite(IndexSearcher searcher) throws

Re: [I] [DISCUSS] Should there be a threshold-based vector search API? [lucene]

2023-10-12 Thread via GitHub
msokolov commented on issue #12579: URL: https://github.com/apache/lucene/issues/12579#issuecomment-1759925268 Interesting - so this is kind of like a noisy radius search in high dimensions? It makes sense to me intuitively since we don't generally expect searches to have the same number of

Re: [PR] Avoid use docsSeen in BKDWriter [lucene]

2023-10-12 Thread via GitHub
easyice commented on code in PR #12658: URL: https://github.com/apache/lucene/pull/12658#discussion_r1357058015 ## lucene/core/src/java/org/apache/lucene/util/bkd/BKDWriter.java: ## @@ -519,9 +526,8 @@ private Runnable writeFieldNDims( // compute the min/max for this slice

Re: [PR] Avoid use docsSeen in BKDWriter [lucene]

2023-10-12 Thread via GitHub
easyice commented on code in PR #12658: URL: https://github.com/apache/lucene/pull/12658#discussion_r1357058015 ## lucene/core/src/java/org/apache/lucene/util/bkd/BKDWriter.java: ## @@ -519,9 +526,8 @@ private Runnable writeFieldNDims( // compute the min/max for this slice

Re: [PR] Ability to compute vector similarity scores with DoubleValuesSource [lucene]

2023-10-12 Thread via GitHub
shubhamvishu commented on code in PR #12548: URL: https://github.com/apache/lucene/pull/12548#discussion_r1357050474 ## lucene/core/src/java/org/apache/lucene/search/DoubleValuesSource.java: ## @@ -172,6 +173,52 @@ public LongValuesSource rewrite(IndexSearcher searcher) throws

Re: [PR] Ensure LeafCollector#finish is only called once on the main collector during drill-sideways [lucene]

2023-10-12 Thread via GitHub
gsmiller commented on code in PR #12642: URL: https://github.com/apache/lucene/pull/12642#discussion_r1357072892 ## lucene/facet/src/test/org/apache/lucene/facet/TestDrillSideways.java: ## @@ -316,6 +316,58 @@ public void testBasic() throws Exception { IOUtils.close(searche

Re: [PR] Avoid use docsSeen in BKDWriter [lucene]

2023-10-12 Thread via GitHub
easyice commented on PR #12658: URL: https://github.com/apache/lucene/pull/12658#issuecomment-1759980060 > I don't like when merges allocate O(maxDoc) memory so I'm keen to fixing this. The idea looks correct to me, I wonder if we can make it a bit more robust? okay, I will take a lo

[PR] 8.11 deps [lucene-solr]

2023-10-12 Thread via GitHub
risdenk opened a new pull request, #2681: URL: https://github.com/apache/lucene-solr/pull/2681 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] Ensure LeafCollector#finish is only called once on the main collector during drill-sideways [lucene]

2023-10-12 Thread via GitHub
gsmiller commented on code in PR #12642: URL: https://github.com/apache/lucene/pull/12642#discussion_r1357145658 ## lucene/facet/src/test/org/apache/lucene/facet/TestDrillSideways.java: ## @@ -316,6 +317,68 @@ public void testBasic() throws Exception { IOUtils.close(searche

Re: [PR] Ensure LeafCollector#finish is only called once on the main collector during drill-sideways [lucene]

2023-10-12 Thread via GitHub
gf2121 commented on code in PR #12642: URL: https://github.com/apache/lucene/pull/12642#discussion_r1357151214 ## lucene/facet/src/test/org/apache/lucene/facet/TestDrillSideways.java: ## @@ -316,6 +316,58 @@ public void testBasic() throws Exception { IOUtils.close(searcher.

Re: [PR] Ability to compute vector similarity scores with DoubleValuesSource [lucene]

2023-10-12 Thread via GitHub
benwtrent commented on PR #12548: URL: https://github.com/apache/lucene/pull/12548#issuecomment-1760058838 These are good ideas @msokolov , @shubhamvishu a follow up PR is most welcome. -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [I] Add the ability to compute vector similarity scores with the new ValuesSource API [lucene]

2023-10-12 Thread via GitHub
benwtrent closed issue #12394: Add the ability to compute vector similarity scores with the new ValuesSource API URL: https://github.com/apache/lucene/issues/12394 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] Ability to compute vector similarity scores with DoubleValuesSource [lucene]

2023-10-12 Thread via GitHub
benwtrent merged PR #12548: URL: https://github.com/apache/lucene/pull/12548 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-12 Thread via GitHub
mayya-sharipova commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1357182135 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsFormat.java: ## @@ -0,0 +1,209 @@ +/* + * Licensed to the Apache Software Foundati

Re: [PR] Extract the hnsw graph merging from being part of the vector writer [lucene]

2023-10-12 Thread via GitHub
benwtrent commented on PR #12657: URL: https://github.com/apache/lucene/pull/12657#issuecomment-1760088558 @zhaih I updated the API a bit. This is more like I was thinking. Having a builder that accepts readers, doc maps, etc. And then can build with the final merge state. -- This is an

Re: [PR] Ability to compute vector similarity scores with DoubleValuesSource [lucene]

2023-10-12 Thread via GitHub
msokolov commented on code in PR #12548: URL: https://github.com/apache/lucene/pull/12548#discussion_r1357190877 ## lucene/core/src/java/org/apache/lucene/search/DoubleValuesSource.java: ## @@ -172,6 +173,52 @@ public LongValuesSource rewrite(IndexSearcher searcher) throws IOEx

Re: [PR] Add interface VectorValues to be implemented by [Float/Byte]VectorValues [lucene]

2023-10-12 Thread via GitHub
benwtrent commented on PR #12636: URL: https://github.com/apache/lucene/pull/12636#issuecomment-1760104677 > Which interface are you referring to RandomAccessVectorValues? as that uses generics to avoid duplication? During the refactoring for byte/float I did a while back, during

Re: [PR] [DRAFT] Concurrent HNSW Merge [lucene]

2023-10-12 Thread via GitHub
msokolov commented on PR #12660: URL: https://github.com/apache/lucene/pull/12660#issuecomment-1760118913 This looks like a great start. I worked up a very similar PR but I think I have some concurrency bugs around the update/add of new entry points in the upper graph levels. I might post m

Re: [PR] Add a merge policy wrapper that performs recursive graph bisection on merge. [lucene]

2023-10-12 Thread via GitHub
s1monw commented on code in PR #12622: URL: https://github.com/apache/lucene/pull/12622#discussion_r1357159891 ## lucene/core/src/java/org/apache/lucene/index/MergePolicy.java: ## @@ -288,11 +288,32 @@ final void close( } } -/** Wrap the reader in order to add/

Re: [PR] Add a merge policy wrapper that performs recursive graph bisection on merge. [lucene]

2023-10-12 Thread via GitHub
s1monw commented on PR #12622: URL: https://github.com/apache/lucene/pull/12622#issuecomment-1760126316 just for the record. I think we should record if a segment was written and it contains blocks and make it viral. that might be a good idea anyway. -- This is an automated message from

Re: [PR] Initial take at adding JMH micro-benchmarks [lucene]

2023-10-12 Thread via GitHub
dweiss merged PR #12663: URL: https://github.com/apache/lucene/pull/12663 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [I] ability to run JMH benchmarks from gradle [lucene]

2023-10-12 Thread via GitHub
dweiss closed issue #12641: ability to run JMH benchmarks from gradle URL: https://github.com/apache/lucene/issues/12641 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [I] ability to run JMH benchmarks from gradle [lucene]

2023-10-12 Thread via GitHub
dweiss commented on issue #12641: URL: https://github.com/apache/lucene/issues/12641#issuecomment-1760149489 I left just a single benchmark in, @rmuir - if you'd like to move all of them, please do so. If something doesn't work, let me know (it should though). -- This is an automated mess

Re: [PR] Initial take at adding JMH micro-benchmarks [lucene]

2023-10-12 Thread via GitHub
dweiss commented on code in PR #12663: URL: https://github.com/apache/lucene/pull/12663#discussion_r1357235935 ## lucene/benchmark-jmh/README.txt: ## @@ -0,0 +1,21 @@ +The :lucene:benchmark-jmh module contains can be used to compile +and execute JMH (https://github.com/openjdk/j

Re: [PR] Optimize OnHeapHnswGraph's data structure [lucene]

2023-10-12 Thread via GitHub
zhaih commented on code in PR #12651: URL: https://github.com/apache/lucene/pull/12651#discussion_r1357260551 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ## @@ -288,7 +294,6 @@ private void selectAndLinkDiverse( // only adding it if it is cl

Re: [PR] Optimize OnHeapHnswGraph's data structure [lucene]

2023-10-12 Thread via GitHub
zhaih commented on code in PR #12651: URL: https://github.com/apache/lucene/pull/12651#discussion_r1357261063 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java: ## @@ -232,7 +231,6 @@ void searchLevel( graphSeek(graph, level, topCandidateNode);

Re: [PR] Optimize OnHeapHnswGraph's data structure [lucene]

2023-10-12 Thread via GitHub
zhaih commented on code in PR #12651: URL: https://github.com/apache/lucene/pull/12651#discussion_r1357264961 ## lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java: ## @@ -40,31 +41,29 @@ public final class OnHeapHnswGraph extends HnswGraph implements Account

[I] Enable recursive graph bisection out of the box? [lucene]

2023-10-12 Thread via GitHub
jpountz opened a new issue, #12665: URL: https://github.com/apache/lucene/issues/12665 ### Description It would be nice to enable recursive graph bisection out of the box, so that users don't even have to know that it exists or what it is to enjoy its search-time performance benefits

Re: [PR] Optimize OnHeapHnswGraph's data structure [lucene]

2023-10-12 Thread via GitHub
zhaih commented on PR #12651: URL: https://github.com/apache/lucene/pull/12651#issuecomment-1760235738 OK I have incorporate all the learning I have from #12660 and added several more assertions to make it safer, please take a look again when you have time @msokolov, thanks! -- This is a

Re: [PR] Ability to compute vector similarity scores with DoubleValuesSource [lucene]

2023-10-12 Thread via GitHub
shubhamvishu commented on code in PR #12548: URL: https://github.com/apache/lucene/pull/12548#discussion_r1357312134 ## lucene/core/src/java/org/apache/lucene/search/DoubleValuesSource.java: ## @@ -172,6 +173,52 @@ public LongValuesSource rewrite(IndexSearcher searcher) throws

Re: [I] ability to run JMH benchmarks from gradle [lucene]

2023-10-12 Thread via GitHub
rmuir commented on issue #12641: URL: https://github.com/apache/lucene/issues/12641#issuecomment-1760262080 @dweiss thank you, I will fix and cleanup all the benchmarks and archive https://github.com/rmuir/vectorbench as soon as I can :) -- This is an automated message from the Apache Git

Re: [I] ability to run JMH benchmarks from gradle [lucene]

2023-10-12 Thread via GitHub
rmuir commented on issue #12641: URL: https://github.com/apache/lucene/issues/12641#issuecomment-1760263361 I can also add some stuff from the README there such as how to install the disassembler. It is stuff that you forget easily if you haven't done it in a while. -- This is an automat

Re: [PR] [DRAFT] Concurrent HNSW Merge [lucene]

2023-10-12 Thread via GitHub
zhaih commented on PR #12660: URL: https://github.com/apache/lucene/pull/12660#issuecomment-1760280351 That sounds great, thanks Mike! Please post the PR and I will try to see how to put it together. Meanwhile let's try to get the prerequisite PR reviewed and pushed first? Such that thi

[I] normalize() override provided in Simple example in Analyzer class doc is missing String fieldName parameter [lucene]

2023-10-12 Thread via GitHub
Bluetopia opened a new issue, #12666: URL: https://github.com/apache/lucene/issues/12666 ### Description Using: https://javadoc.io/doc/org.apache.lucene/lucene-core/latest/org/apache/lucene/analysis/Analyzer.html which should hopefully be up to date. The current state of the source f

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-12 Thread via GitHub
mayya-sharipova commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1357345297 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsWriter.java: ## @@ -0,0 +1,782 @@ +/* + * Licensed to the Apache Softwa

Re: [PR] Extract the hnsw graph merging from being part of the vector writer [lucene]

2023-10-12 Thread via GitHub
benwtrent commented on code in PR #12657: URL: https://github.com/apache/lucene/pull/12657#discussion_r1357376455 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/IncrementalHnswGraphMerger.java: ## @@ -0,0 +1,209 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-12 Thread via GitHub
benwtrent commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1357380984 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsFormat.java: ## @@ -0,0 +1,209 @@ +/* + * Licensed to the Apache Software Foundation (AS

[PR] migrate all vectorbench methods to lucene [lucene]

2023-10-12 Thread via GitHub
rmuir opened a new pull request, #12667: URL: https://github.com/apache/lucene/pull/12667 Following up to @dweiss work, this gives us the same benchmarks as https://github.com/rmuir/vectorbench, just without the code duplication and maintenance hassle. Each method is simply invoked w

Re: [PR] migrate all vectorbench methods to lucene [lucene]

2023-10-12 Thread via GitHub
rmuir commented on PR #12667: URL: https://github.com/apache/lucene/pull/12667#issuecomment-1760526957 You can still tell what's happening too due to the log messages. When each benchmark runs, you see a single message: xxxScalar() methods: ``` WARNING: Java vector incubator mod

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-10-12 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1353826324 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -21,19 +21,18 @@ import java.util.List; import org.apache.lucene.store.DataInput; import

Re: [PR] Optimize OnHeapHnswGraph's data structure [lucene]

2023-10-12 Thread via GitHub
zhaih commented on code in PR #12651: URL: https://github.com/apache/lucene/pull/12651#discussion_r1357264961 ## lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java: ## @@ -40,31 +41,29 @@ public final class OnHeapHnswGraph extends HnswGraph implements Account

  1   2   >