Re: [I] segmentInfos.replace() doesn't set userData [lucene]

2023-10-15 Thread via GitHub
Shibi-bala commented on issue #12637: URL: https://github.com/apache/lucene/issues/12637#issuecomment-1763742706 @msfroh this impacts the ability to snapshot since you can't read old `userData`. Check out the test in my PR: https://github.com/apache/lucene/pull/12626 -- This is an automa

Re: [PR] Optimize OnHeapHnswGraph's data structure [lucene]

2023-10-15 Thread via GitHub
zhaih commented on PR #12651: URL: https://github.com/apache/lucene/pull/12651#issuecomment-1763700781 I reran'd the benchmark and still get the similar perf and same recall. (Just to make sure the later edits have not messed up things) -- This is an automated message from the Apache Git

Re: [PR] Optimize OnHeapHnswGraph's data structure [lucene]

2023-10-15 Thread via GitHub
zhaih commented on code in PR #12651: URL: https://github.com/apache/lucene/pull/12651#discussion_r1360085712 ## lucene/core/src/test/org/apache/lucene/util/hnsw/HnswGraphTestCase.java: ## @@ -454,77 +454,6 @@ public void testSearchWithSelectiveAcceptOrds() throws IOException {

Re: [PR] Optimize OnHeapHnswGraph's data structure [lucene]

2023-10-15 Thread via GitHub
zhaih commented on code in PR #12651: URL: https://github.com/apache/lucene/pull/12651#discussion_r1360085215 ## lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java: ## @@ -158,50 +185,82 @@ public int entryNode() { return entryNode; } + /** + * WA

Re: [PR] Optimize OnHeapHnswGraph's data structure [lucene]

2023-10-15 Thread via GitHub
zhaih commented on code in PR #12651: URL: https://github.com/apache/lucene/pull/12651#discussion_r1360084970 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java: ## @@ -284,6 +285,13 @@ int graphNextNeighbor(HnswGraph graph) throws IOException { retu

Re: [PR] [DRAFT] Concurrent HNSW Merge [lucene]

2023-10-15 Thread via GitHub
zhaih commented on PR #12660: URL: https://github.com/apache/lucene/pull/12660#issuecomment-1763695626 Thanks @msokolov, I'll take a look soon. > so ideally we'd compare total time to merge using single-threaded vs using this. The most fair comparison here I posted might be the

Re: [PR] Fix segmentInfos replace doesn't set userData [lucene]

2023-10-15 Thread via GitHub
MarcusSorealheis commented on PR #12626: URL: https://github.com/apache/lucene/pull/12626#issuecomment-1763688716 Hi @Shibi-bala and great to see you here. Let's sync up this week and maybe we can help move this PR forward. It's a good catch, so thank you. -- This is an automated message

Re: [I] Make OrdinalMap maps docID to global ordinal directly? [lucene]

2023-10-15 Thread via GitHub
vsop-479 closed issue #12669: Make OrdinalMap maps docID to global ordinal directly? URL: https://github.com/apache/lucene/issues/12669 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [I] Make OrdinalMap maps docID to global ordinal directly? [lucene]

2023-10-15 Thread via GitHub
vsop-479 commented on issue #12669: URL: https://github.com/apache/lucene/issues/12669#issuecomment-1763633876 > need to store maxDoc global ordinals (one per doc) instead of valueCount global ordinals. @jpountz Thanks for remind that. I will close this issue. -- This is an automa

Re: [PR] Optimize OnHeapHnswGraph's data structure [lucene]

2023-10-15 Thread via GitHub
zhaih commented on code in PR #12651: URL: https://github.com/apache/lucene/pull/12651#discussion_r1359985681 ## lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java: ## @@ -158,50 +185,82 @@ public int entryNode() { return entryNode; } + /** + * WA

Re: [PR] Optimize OnHeapHnswGraph's data structure [lucene]

2023-10-15 Thread via GitHub
zhaih commented on code in PR #12651: URL: https://github.com/apache/lucene/pull/12651#discussion_r1359984597 ## lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java: ## @@ -99,27 +122,31 @@ public void addNode(int level, int node) { entryNode = node;

Re: [PR] Optimize OnHeapHnswGraph's data structure [lucene]

2023-10-15 Thread via GitHub
zhaih commented on code in PR #12651: URL: https://github.com/apache/lucene/pull/12651#discussion_r1359984155 ## lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java: ## @@ -74,23 +88,32 @@ public final class OnHeapHnswGraph extends HnswGraph implements Account

[PR] Fix jacoco coverage tests (add createClassLoader to replicator permissions) [lucene]

2023-10-15 Thread via GitHub
dweiss opened a new pull request, #12684: URL: https://github.com/apache/lucene/pull/12684 Code coverage tests have been failing with an exception thrown from jacoco's premain: ``` Caused by: java.security.AccessControlException: access denied ("java.lang.RuntimePermission" "createCla

Re: [PR] Extract the hnsw graph merging from being part of the vector writer [lucene]

2023-10-15 Thread via GitHub
msokolov commented on code in PR #12657: URL: https://github.com/apache/lucene/pull/12657#discussion_r1359929577 ## lucene/core/src/java/org/apache/lucene/util/hnsw/InitializedHnswGraphBuilder.java: ## @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

Re: [PR] [DRAFT] Concurrent HNSW Merge [lucene]

2023-10-15 Thread via GitHub
msokolov commented on PR #12660: URL: https://github.com/apache/lucene/pull/12660#issuecomment-1763471838 OK, I posted https://github.com/apache/lucene/pull/12683. I'm curious about the performance measurements. We want to know how much lock contention there is so ideally we'd compare total

[PR] [BROKEN, for reference only] concurrent hnsw [lucene]

2023-10-15 Thread via GitHub
msokolov opened a new pull request, #12683: URL: https://github.com/apache/lucene/pull/12683 ### Description adds row locks to OnHeapHnswGraph seems to be broken in the way it adds new nodes to upper levels such that they become discoverable before they have neighbors? I

Re: [PR] add tests for vectorutils integer boundaries [lucene]

2023-10-15 Thread via GitHub
uschindler commented on PR #12634: URL: https://github.com/apache/lucene/pull/12634#issuecomment-1763463645 I backported this one to 9.x. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Speedup integer functions for 128-bit neon vectors [lucene]

2023-10-15 Thread via GitHub
uschindler commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1763463569 I backported this one to 9.x. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] simple cleanups to vector code [lucene]

2023-10-15 Thread via GitHub
uschindler commented on PR #12680: URL: https://github.com/apache/lucene/pull/12680#issuecomment-1763463280 I backported this one to 9.x. As I cleaned up commits, this one was committed in 9.x as part of #12681 -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
uschindler commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763462800 I backported this one to 9.x. For that I selected the separate commits, cherrypicked and squashed them with my tortoise GUI (sorry!). -- This is an automated message from the Apache

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
uschindler commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763454286 I can take care of backporting, also your previous commits. For maintenenace it would be good as the code differs dramatically now between main and 9.x and makes merging hard. -- T

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
uschindler commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763426882 > > > ok, all done. "asfgit" can do proper merge commit :) thanks @uschindler ! > > > > > > The only downside is that backporting gets harder. So squashed PRs are easier to

Re: [PR] Optimize OnHeapHnswGraph's data structure [lucene]

2023-10-15 Thread via GitHub
msokolov commented on code in PR #12651: URL: https://github.com/apache/lucene/pull/12651#discussion_r1359884135 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java: ## @@ -284,6 +285,13 @@ int graphNextNeighbor(HnswGraph graph) throws IOException { r

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
rmuir commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763402811 > > ok, all done. "asfgit" can do proper merge commit :) thanks @uschindler ! > > The only downside is that backporting gets harder. So squashed PRs are easier to handle in that rega

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
rmuir commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763402251 I haven't been backporting recent vector changes as they are fairly aggressive. That's just my thoughts. If we want to do that, we have to do it proper and backport test, benchmarks, loggi

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
uschindler commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763399661 > ok, all done. "asfgit" can do proper merge commit :) thanks @uschindler ! The only downside is that backporting gets harder. So squashed PRs are easier to handle in that regar

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
uschindler commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763386185 Will you backport? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Avoid use docsSeen in BKDWriter [lucene]

2023-10-15 Thread via GitHub
easyice commented on code in PR #12658: URL: https://github.com/apache/lucene/pull/12658#discussion_r1357058015 ## lucene/core/src/java/org/apache/lucene/util/bkd/BKDWriter.java: ## @@ -519,9 +526,8 @@ private Runnable writeFieldNDims( // compute the min/max for this slice

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
rmuir commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763380049 ok, all done. "asfgit" can do proper merge commit :) thanks @uschindler ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
asfgit merged PR #12681: URL: https://github.com/apache/lucene/pull/12681 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
rmuir commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763378774 > At least we should have the option to select both variants! Yeah thats strange, It doesnt make sense to remove the option from the merge button. I will use "git push" to workaround

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
uschindler commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763378449 > and hope to avoid more conflicts (it is a conflict nightmare...) This is why I tend to make one PR after the other. So I don't want to start another test cleanup one before th

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
rmuir commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763378070 OK, i just wanted to do "enough" on the tests to make some progress. If we just add _128, _256, _512 variants without addressing the testing, then we create a potential problem. I w

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
uschindler commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763377432 > I understand the git issue now. Why do we configure the merge button as "squash"? I'd like to default it to "merge" because I value my sanity! At least we should have the opti

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
rmuir commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763373537 I understand the git issue now. Why do we configure the merge button as "squash"? I'd like to default it to "merge" because I value my sanity! -- This is an automated message from the Ap

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
rmuir commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763372114 wow, merge conflicts, really? git is terrible sometimes. literally didn't merge anything that isn't in this branch already. -- This is an automated message from the Apache Git Service. T

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
uschindler commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763372042 > I don't mind if sysprop code moves, as long as you can ensure no classloading deadlocks No classloading deadlock possible. We have a Holder for that already: https://github.c

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
rmuir commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763371551 I don't mind if sysprop code moves, as long as you can ensure no classloading deadlocks -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
uschindler commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763371401 Otherwise this is a great improvement for testing, we just need to fine-tune it! ❤️ -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
uschindler commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763370765 In addition to that, maybe we should change TestVectorUtilSupport to spawn 3 child JVMs and don't set the property top-level? The problem is currently any of those: - If we e

Re: [PR] simple cleanups to vector code [lucene]

2023-10-15 Thread via GitHub
rmuir merged PR #12680: URL: https://github.com/apache/lucene/pull/12680 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

Re: [PR] Extract the hnsw graph merging from being part of the vector writer [lucene]

2023-10-15 Thread via GitHub
benwtrent commented on code in PR #12657: URL: https://github.com/apache/lucene/pull/12657#discussion_r1359865451 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/IncrementalHnswGraphMerger.java: ## @@ -0,0 +1,189 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
uschindler commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763361313 > > This also makes this test reproducible from random seed regardless of the hardware, as `SPECIES_PREFERRED` is not used at all in tests. From a test perspective, it is like a forbi

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
ChrisHegarty commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763359841 > for me, when investigating a modification, this works easily enough: > > ``` > $ for bits in 128 256 512; do ./gradlew -p lucene/core test --tests TestVectorUtilSupport -

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
ChrisHegarty commented on code in PR #12681: URL: https://github.com/apache/lucene/pull/12681#discussion_r1359857545 ## lucene/core/src/java20/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java: ## @@ -16,165 +16,155 @@ */ package org.apache.lucene.internal

Re: [PR] Better detect vector module in non-default setups (e.g., custom module layers) [lucene]

2023-10-15 Thread via GitHub
ChrisHegarty commented on code in PR #12677: URL: https://github.com/apache/lucene/pull/12677#discussion_r1359852957 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java: ## @@ -120,24 +124,21 @@ static VectorizationProvider lookup(boolean

Re: [PR] Better detect vector module in non-default setups (e.g., custom module layers) [lucene]

2023-10-15 Thread via GitHub
uschindler merged PR #12677: URL: https://github.com/apache/lucene/pull/12677 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Better detect vector module in non-default setups (e.g., custom module layers) [lucene]

2023-10-15 Thread via GitHub
uschindler commented on code in PR #12677: URL: https://github.com/apache/lucene/pull/12677#discussion_r1359852363 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java: ## @@ -120,24 +124,21 @@ static VectorizationProvider lookup(boolean te

Re: [PR] Better detect vector module in non-default setups (e.g., custom module layers) [lucene]

2023-10-15 Thread via GitHub
ChrisHegarty commented on code in PR #12677: URL: https://github.com/apache/lucene/pull/12677#discussion_r1359850988 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java: ## @@ -120,24 +124,21 @@ static VectorizationProvider lookup(boolean

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
uschindler commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763331631 > This also makes this test reproducible from random seed regardless of the hardware, as `SPECIES_PREFERRED` is not used at all in tests. From a test perspective, it is like a forbidd