Re: [I] Upgrade to Gradle 8.4 [lucene]

2023-10-11 Thread via GitHub
dweiss commented on issue #12655: URL: https://github.com/apache/lucene/issues/12655#issuecomment-1758985802 Let me take a look. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[PR] [DRAFT] Concurrent HNSW Merge [lucene]

2023-10-11 Thread via GitHub
zhaih opened a new pull request, #12660: URL: https://github.com/apache/lucene/pull/12660 ### Description This PR is still a draft contains all kinds of magic number and names. It contains changes of #12651 so I would like to merge that one first. But I do think the logic is correct

[I] Nightly benchmark regression for term dict queries [lucene]

2023-10-11 Thread via GitHub
gf2121 opened a new issue, #12659: URL: https://github.com/apache/lucene/issues/12659 ### Description I'm seeing a regressions of term-dict queries on nightly benchmark: https://home.apache.org/~mikemccand/lucenebench/2023.10.10.18.03.55.html -- This is an automated message from

Re: [PR] Optimize OnHeapHnswGraph's data structure [lucene]

2023-10-11 Thread via GitHub
msokolov commented on code in PR #12651: URL: https://github.com/apache/lucene/pull/12651#discussion_r1356041142 ## lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java: ## @@ -40,31 +41,29 @@ public final class OnHeapHnswGraph extends HnswGraph implements Acco

Re: [PR] Optimize OnHeapHnswGraph's data structure [lucene]

2023-10-11 Thread via GitHub
zhaih commented on PR #12651: URL: https://github.com/apache/lucene/pull/12651#issuecomment-1758887879 Yes that's the idea, although I actually made some mistakes here so the merging is not entirely pre-allocated, also something in searching might be broken due to the size() behavior ch

[PR] Avoid use docsSeen in BKDWriter [lucene]

2023-10-11 Thread via GitHub
easyice opened a new pull request, #12658: URL: https://github.com/apache/lucene/pull/12658 The `BKDWriter#docsSeen` takes up a bit large proportion in the flame graph sometimes in my production environment. but it is not always, maybe we can use the docCount has computed in `PointValuesWr

Re: [PR] Optimize OnHeapHnswGraph's data structure [lucene]

2023-10-11 Thread via GitHub
msokolov commented on PR #12651: URL: https://github.com/apache/lucene/pull/12651#issuecomment-1758885797 I like this! Actually I think when we are merging we can preallocate the entire array so we don't need to resize at all which should greatly simplify making this beast thread-safe (sinc

Re: [PR] Larger default block size for block tree index [lucene]

2023-10-11 Thread via GitHub
gf2121 closed pull request #12656: Larger default block size for block tree index URL: https://github.com/apache/lucene/pull/12656 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Larger default block size for block tree index [lucene]

2023-10-11 Thread via GitHub
gf2121 commented on PR #12656: URL: https://github.com/apache/lucene/pull/12656#issuecomment-1758866543 The improvement seem not stable, another run: ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value

Re: [I] Upgrade to Gradle 8.4 [lucene]

2023-10-11 Thread via GitHub
risdenk commented on issue #12655: URL: https://github.com/apache/lucene/issues/12655#issuecomment-1758596094 It has to be the interaction with ``` // Any javac compilation tasks tasks.withType(JavaCompile) { dependsOn ":altJvmWarning" options.fork = tru

Re: [I] Upgrade to Gradle 8.4 [lucene]

2023-10-11 Thread via GitHub
uschindler commented on issue #12655: URL: https://github.com/apache/lucene/issues/12655#issuecomment-1758564752 Hrm Let's see how this affects us us. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [I] Upgrade to Gradle 8.4 [lucene]

2023-10-11 Thread via GitHub
risdenk commented on issue #12655: URL: https://github.com/apache/lucene/issues/12655#issuecomment-1758554468 @dweiss already filed https://github.com/gradle/gradle/issues/22746 and it looks like somehow the workaround in `gradle/hacks/turbocharge-jvm-opts.gradle` is broken in Gradle 8.4 w/

Re: [I] Upgrade to Gradle 8.4 [lucene]

2023-10-11 Thread via GitHub
risdenk commented on issue #12655: URL: https://github.com/apache/lucene/issues/12655#issuecomment-1758551645 The cause is `gradle/hacks/turbocharge-jvm-opts.gradle` If I comment out `apply from: file('gradle/hacks/turbocharge-jvm-opts.gradle')` in top level `build.gradle` things wo

Re: [PR] Extract the hnsw graph merging from being part of the vector writer [lucene]

2023-10-11 Thread via GitHub
zhaih commented on code in PR #12657: URL: https://github.com/apache/lucene/pull/12657#discussion_r1355784178 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/IncrementalHnswGraphMerger.java: ## @@ -0,0 +1,226 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [I] Upgrade to Gradle 8.4 [lucene]

2023-10-11 Thread via GitHub
risdenk commented on issue #12655: URL: https://github.com/apache/lucene/issues/12655#issuecomment-1758547290 snippet from: ``` RUNTIME_JAVA_HOME=/Library/Java/JavaVirtualMachines/openjdk.jdk/Contents/Home ./gradlew check -x test -Pvalidation.git.failOnModified=false --info ```

Re: [I] Upgrade to Gradle 8.4 [lucene]

2023-10-11 Thread via GitHub
risdenk commented on issue #12655: URL: https://github.com/apache/lucene/issues/12655#issuecomment-1758537719 I locally reverted gradlew changes and that didn't help fix anything. So it looks like its something in the newer gradle version and how `RUNTIME_JAVA_HOME` is being handled :/ --

Re: [I] Upgrade to Gradle 8.4 [lucene]

2023-10-11 Thread via GitHub
risdenk commented on issue #12655: URL: https://github.com/apache/lucene/issues/12655#issuecomment-1758523181 `RUNTIME_JAVA_HOME=/Library/Java/JavaVirtualMachines/openjdk.jdk/Contents/Home ./gradlew check -x test` reproduced the same exception for me locally. but for ``` >

Re: [I] Upgrade to Gradle 8.4 [lucene]

2023-10-11 Thread via GitHub
uschindler commented on issue #12655: URL: https://github.com/apache/lucene/issues/12655#issuecomment-1758519245 I have no idea why those are injected only for compilation of those tests. Core test compilation works fine. So there seems to be some difference on the distribution tests.

Re: [I] Upgrade to Gradle 8.4 [lucene]

2023-10-11 Thread via GitHub
uschindler commented on issue #12655: URL: https://github.com/apache/lucene/issues/12655#issuecomment-1758517444 Jenkins uses: RUNTIME_JAVA_HOME and TEST_ARGS environment variables. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] Refactor ByteBlockPool so it is just a "shift/mask big array" [lucene]

2023-10-11 Thread via GitHub
stefanvodita commented on code in PR #12625: URL: https://github.com/apache/lucene/pull/12625#discussion_r1355758154 ## lucene/core/src/java/org/apache/lucene/index/TermsHashPerField.java: ## @@ -255,6 +255,81 @@ final void writeBytes(int stream, byte[] b, int offset, int len)

Re: [I] Upgrade to Gradle 8.4 [lucene]

2023-10-11 Thread via GitHub
uschindler commented on issue #12655: URL: https://github.com/apache/lucene/issues/12655#issuecomment-1758514194 See this build. Gradle command line and env cars can be found there: https://jenkins.thetaphi.de/job/Lucene-main-Linux/44942/console -- This is an automated message from the Ap

Re: [I] Upgrade to Gradle 8.4 [lucene]

2023-10-11 Thread via GitHub
risdenk commented on issue #12655: URL: https://github.com/apache/lucene/issues/12655#issuecomment-1758513642 Hmmm I checked this with `./gradlew check` on both JDK 17 and JDK 21 locally. So it must be something w/ Gradle property being sent in :( - do you have a Jenkins job url? I

Re: [I] Upgrade to Gradle 8.4 [lucene]

2023-10-11 Thread via GitHub
uschindler commented on issue #12655: URL: https://github.com/apache/lucene/issues/12655#issuecomment-1758510093 There seems to be an issue with compiling the distribution tests: ``` > Task :lucene:core.tests:compileJava FAILED error: invalid flag: -XX:+UseParallelGC Usage: ja

Re: [I] Upgrade to Gradle 8.4 [lucene]

2023-10-11 Thread via GitHub
risdenk closed issue #12655: Upgrade to Gradle 8.4 URL: https://github.com/apache/lucene/issues/12655 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: is

Re: [I] Upgrade to Gradle 8.4 [lucene]

2023-10-11 Thread via GitHub
risdenk commented on issue #12655: URL: https://github.com/apache/lucene/issues/12655#issuecomment-1758484315 Merged to branch_9x as: * d1f73214a3c0fbc0f9cb783c8a9ab11ffa73ea36 * b3a9375f2793f3d414bbcfdd659a8a6da485d9a8 * e59c607daf98bd0e0faf259ac9c9cf2e3cff5807 * aa968f96d6c9d80d

Re: [PR] Gradle 8.4 [lucene]

2023-10-11 Thread via GitHub
uschindler commented on PR #12650: URL: https://github.com/apache/lucene/pull/12650#issuecomment-1758478781 Thanks! 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [I] Upgrade to Gradle 8.4 [lucene]

2023-10-11 Thread via GitHub
risdenk commented on issue #12655: URL: https://github.com/apache/lucene/issues/12655#issuecomment-1758470856 Merged to main as: * 30d3eba93314fbaa014e885a21d2aa1588433df5 * 2c42b8941aa3247b8e24451af1553993f6a95079 * de3b294be44c2cea51a5e909db05043aa8d15150 * a8fba38f1691199e0813d

Re: [PR] Gradle 8.4 [lucene]

2023-10-11 Thread via GitHub
risdenk closed pull request #12650: Gradle 8.4 URL: https://github.com/apache/lucene/pull/12650 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-u

Re: [PR] Gradle 8.4 [lucene]

2023-10-11 Thread via GitHub
risdenk commented on PR #12650: URL: https://github.com/apache/lucene/pull/12650#issuecomment-1758470625 Merged to main as: * 30d3eba93314fbaa014e885a21d2aa1588433df5 * 2c42b8941aa3247b8e24451af1553993f6a95079 * de3b294be44c2cea51a5e909db05043aa8d15150 * a8fba38f1691199e0813d71783

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-11 Thread via GitHub
benwtrent commented on PR #12582: URL: https://github.com/apache/lucene/pull/12582#issuecomment-1758391298 I realize this is a large PR. I am happy to go into more details in anything that doesn't make sense (probably indicates the code isn't very readable or there should be better comments

[PR] Extract the hnsw graph merging from being part of the vector writer [lucene]

2023-10-11 Thread via GitHub
benwtrent opened a new pull request, #12657: URL: https://github.com/apache/lucene/pull/12657 While working on the quantization codec & thinking about how merging will evolve, it became clearer that having merging attached directly to the vector writer is weird. I extracted it out to

[PR] Larger default block size for block tree index [lucene]

2023-10-11 Thread via GitHub
gf2121 opened a new pull request, #12656: URL: https://github.com/apache/lucene/pull/12656 I tried to adjust the block size to [min:35, max:68] and run perf tasks on `wikimediumall`. Surprisingly, I see almost every term-related task get faster, including PKLookup, which expected to be slow

Re: [PR] Gradle 8.4 [lucene]

2023-10-11 Thread via GitHub
risdenk commented on PR #12650: URL: https://github.com/apache/lucene/pull/12650#issuecomment-1757986422 Agreed @uschindler I created https://github.com/apache/lucene/issues/12655 so I can prefix each commit w/ the issue identifier. -- This is an automated message from the Apache Git Ser

Re: [I] Upgrade to Gradle 8.4 [lucene]

2023-10-11 Thread via GitHub
risdenk commented on issue #12655: URL: https://github.com/apache/lucene/issues/12655#issuecomment-1757985173 Handled by https://github.com/apache/lucene/pull/12650 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-11 Thread via GitHub
benwtrent commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1355224614 ## lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java: ## @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [I] TestIndexWriterOnVMError.testUnknownError timesout [lucene]

2023-10-11 Thread via GitHub
benwtrent commented on issue #12654: URL: https://github.com/apache/lucene/issues/12654#issuecomment-1757926532 ``` final int numIterations = TEST_NIGHTLY ? atLeast(100) : atLeast(5); ``` Is one such place where `nightly` matters from what I can tell. I verified the `139`

Re: [I] TestIndexWriterOnVMError.testUnknownError timesout [lucene]

2023-10-11 Thread via GitHub
benwtrent commented on issue #12654: URL: https://github.com/apache/lucene/issues/12654#issuecomment-1757910972 Added some println statements to ``` public void eval(MockDirectoryWrapper dir) throws IOException { if (r.nextInt(3000) == 0) { if (callStackC

[I] TestIndexWriterOnVMError.testUnknownError timesout [lucene]

2023-10-11 Thread via GitHub
benwtrent opened a new issue, #12654: URL: https://github.com/apache/lucene/issues/12654 ### Description CI indicated the test suite timed out. So, I ran the reproduction line locally and had to kill the test running after 5 minutes. I seriously doubt this test should take long

Re: [I] TestSizeBoundedForceMerge.testByteSizeLimit test failure [lucene]

2023-10-11 Thread via GitHub
benwtrent commented on issue #12648: URL: https://github.com/apache/lucene/issues/12648#issuecomment-1757820922 The replication has a `min` segment size of `695` in bytes. Running the test multiple times shows that it generally has a higher segment size of at least `699` when the test passe

Re: [PR] SOLR-17025: Upgrade Jetty to 9.4.53.v20231009 [lucene-solr]

2023-10-11 Thread via GitHub
risdenk merged PR #2680: URL: https://github.com/apache/lucene-solr/pull/2680 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Improve refresh speed with softdelete enable [lucene]

2023-10-11 Thread via GitHub
easyice commented on PR #12557: URL: https://github.com/apache/lucene/pull/12557#issuecomment-1757655210 Thanks @jpountz , it isn't perfect, there seems to be no better way to avoid probing data and know early on that all values are the same, I am just trying to improve, but i haven't thou

Re: [PR] Gradle 8.4 [lucene]

2023-10-11 Thread via GitHub
risdenk commented on PR #12650: URL: https://github.com/apache/lucene/pull/12650#issuecomment-1757647698 I pushed changes based on @dweiss suggestion of > separate all build-related changes from the google format upgrade. Then apply build changes (one patch), apply google format upgra

Re: [PR] Gradle 8.4 [lucene]

2023-10-11 Thread via GitHub
risdenk commented on code in PR #12650: URL: https://github.com/apache/lucene/pull/12650#discussion_r1354938574 ## gradlew: ## @@ -1,7 +1,7 @@ -#!/usr/bin/env sh Review Comment: addressed this in the reformatted changes - dcb33e6b1842d22ca3d10b0fb6ed269a66c67828 -- This

Re: [I] Reproducible TestDrillSideways failure [lucene]

2023-10-11 Thread via GitHub
benwtrent commented on issue #12418: URL: https://github.com/apache/lucene/issues/12418#issuecomment-1757640051 I verified on all three of the failing seeds, setting `.setMergePolicy(NoMergePolicy.INSTANCE);` passes the test. I don't know what that means and if we should consider that

Re: [PR] Gradle 8.4 [lucene]

2023-10-11 Thread via GitHub
risdenk commented on PR #12650: URL: https://github.com/apache/lucene/pull/12650#issuecomment-1757618130 Agree with cleaning up the commit history. I'll do that shortly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] Reproducible TestDrillSideways failure [lucene]

2023-10-11 Thread via GitHub
benwtrent commented on issue #12601: URL: https://github.com/apache/lucene/issues/12601#issuecomment-1757601139 closing as duplicate to https://github.com/apache/lucene/issues/12418 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [I] Reproducible TestDrillSideways failure [lucene]

2023-10-11 Thread via GitHub
benwtrent closed issue #12601: Reproducible TestDrillSideways failure URL: https://github.com/apache/lucene/issues/12601 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [I] Reproducible TestDrillSideways failure [lucene]

2023-10-11 Thread via GitHub
benwtrent commented on issue #12418: URL: https://github.com/apache/lucene/issues/12418#issuecomment-1757600283 Another seed to reproduce: ``` ./gradlew test --tests TestDrillSideways.testRandom -Dtests.seed=668BF9B25DA9EE8A ``` If I change the indexwriting config to:

Re: [PR] Gradle 8.4 [lucene]

2023-10-11 Thread via GitHub
uschindler commented on PR #12650: URL: https://github.com/apache/lucene/pull/12650#issuecomment-1757465742 > Oh, impressive work. I'm surprised it's worked because I know there are incompatibilities between palantir's consistent version plugin and newer spotless releases. > > Overal

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-10-11 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1354640682 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -21,19 +21,18 @@ import java.util.List; import org.apache.lucene.store.DataInput; import

Re: [PR] Gradle 8.4 [lucene]

2023-10-11 Thread via GitHub
dweiss commented on code in PR #12650: URL: https://github.com/apache/lucene/pull/12650#discussion_r1354635205 ## gradlew: ## @@ -1,7 +1,7 @@ -#!/usr/bin/env sh Review Comment: this was intentional (/usr/bin/env). -- This is an automated message from the Apache Git Servi

Re: [PR] Gradle 8.4 [lucene]

2023-10-11 Thread via GitHub
dweiss commented on PR #12650: URL: https://github.com/apache/lucene/pull/12650#issuecomment-1757346355 Just to be clear - the above is not _strictly_ necessary but I think it'd make for a much saner commit history and ability to review what you did to the build files, otherwise it gets obs

Re: [PR] Gradle 8.4 [lucene]

2023-10-11 Thread via GitHub
dweiss commented on PR #12650: URL: https://github.com/apache/lucene/pull/12650#issuecomment-1757307273 Oh, impressive work. I'm surprised it's worked because I know there are incompatibilities between palantir's consistent version plugin and newer spotless releases. Overall, it look

Re: [PR] Optimize computing number of levels in MultiLevelSkipListWriter#bufferSkip [lucene]

2023-10-11 Thread via GitHub
shubhamvishu commented on code in PR #12653: URL: https://github.com/apache/lucene/pull/12653#discussion_r1354595316 ## lucene/core/src/java/org/apache/lucene/codecs/MultiLevelSkipListWriter.java: ## @@ -63,6 +63,9 @@ public abstract class MultiLevelSkipListWriter { /** for e

[PR] Use MergeSorter in StableStringSorter [lucene]

2023-10-11 Thread via GitHub
gf2121 opened a new pull request, #12652: URL: https://github.com/apache/lucene/pull/12652 In #12623, we introduced a `MergeSorter` to take advantage of extra memory to speed up sorting. This PR enables `StringStableSorter` to also benefit from this optimization. -- This is an automate

Re: [PR] Speedup integer functions for 128-bit neon vectors [lucene]

2023-10-11 Thread via GitHub
gf2121 commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1757126242 Get it ! :) [profile.log](https://github.com/apache/lucene/files/12866842/profile.log) -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Early terminate visit BKD leaf when current value greater than upper point in sorted dim. [lucene]

2023-10-11 Thread via GitHub
vsop-479 commented on PR #12528: URL: https://github.com/apache/lucene/pull/12528#issuecomment-1757016549 @iverase Here is a performance data of geo cases. There are some slowdown due to the extra check. query | metric | baseline | candidate | Diff -- | -- | -- | -- | -- pol