Re: [PR] Better detect vector module in non-default setups (e.g., custom module layers) [lucene]

2023-10-14 Thread via GitHub
dweiss commented on code in PR #12677: URL: https://github.com/apache/lucene/pull/12677#discussion_r1359206557 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java: ## @@ -120,24 +122,22 @@ static VectorizationProvider lookup(boolean testMo

Re: [PR] Better detect vector module in non-default setups (e.g., custom module layers) [lucene]

2023-10-14 Thread via GitHub
uschindler commented on code in PR #12677: URL: https://github.com/apache/lucene/pull/12677#discussion_r1359206745 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java: ## @@ -120,24 +122,22 @@ static VectorizationProvider lookup(boolean te

Re: [PR] Better detect vector module in non-default setups (e.g., custom module layers) [lucene]

2023-10-14 Thread via GitHub
uschindler commented on code in PR #12677: URL: https://github.com/apache/lucene/pull/12677#discussion_r1359206745 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java: ## @@ -120,24 +122,22 @@ static VectorizationProvider lookup(boolean te

Re: [I] Multiple ClassNotFoundExceptions in IntelliJ Fat Jar on ARM64 Java 20 [lucene]

2023-10-14 Thread via GitHub
uschindler commented on issue #12307: URL: https://github.com/apache/lucene/issues/12307#issuecomment-1762818363 > @uschindler If fat JARs are not supported or recommended with Lucene, what _is_ the recommended way to deploy a project incorporating Lucene? I cannot find any resources on thi

Re: [I] Multiple ClassNotFoundExceptions in IntelliJ Fat Jar on ARM64 Java 20 [lucene]

2023-10-14 Thread via GitHub
uschindler commented on issue #12307: URL: https://github.com/apache/lucene/issues/12307#issuecomment-1762822051 If you want to create a classical classpath application that can be started with `java -jar application.jar` the correct way is to *NOT* package everything into a fat `applicatio

Re: [I] [DISCUSS] Should there be a threshold-based vector search API? [lucene]

2023-10-14 Thread via GitHub
kaivalnp commented on issue #12579: URL: https://github.com/apache/lucene/issues/12579#issuecomment-1762822602 Thanks @msokolov, this nicely summarizes what I'm trying to say! > https://typesense.org/docs/0.25.0/api/vector-search.html#distance-threshold I took a look here: and [

[PR] Add support for radius-based vector searches [lucene]

2023-10-14 Thread via GitHub
kaivalnp opened a new pull request, #12679: URL: https://github.com/apache/lucene/pull/12679 ### Description Background in #12579 Add support for getting "all vectors within a radius" as opposed to getting the "topK closest vectors" in the current system ### Consideratio

Re: [PR] Move private static classes or functions out of DoubleValuesSource [lucene]

2023-10-14 Thread via GitHub
gsmiller commented on code in PR #12671: URL: https://github.com/apache/lucene/pull/12671#discussion_r1359402306 ## lucene/core/src/java/org/apache/lucene/search/VectorSimilarityValuesSource.java: ## @@ -32,6 +33,52 @@ public VectorSimilarityValuesSource(String fieldName) {

Re: [I] [DISCUSS] Should there be a threshold-based vector search API? [lucene]

2023-10-14 Thread via GitHub
benwtrent commented on issue #12579: URL: https://github.com/apache/lucene/issues/12579#issuecomment-1762966693 @kaivalnp yes, `KnnCollector` should be used for something like this :). Glad its useful! One of the tricky things I can see is that its possible that the bottom layer entr

Re: [I] [DISCUSS] Should there be a threshold-based vector search API? [lucene]

2023-10-14 Thread via GitHub
benwtrent commented on issue #12579: URL: https://github.com/apache/lucene/issues/12579#issuecomment-1762970584 @kaivalnp one other thing to think about is https://weaviate.io/blog/weaviate-1-20-release#autocut I wonder if we could do something similar by dynamically adjusting the "t

Re: [PR] migrate all vectorbench methods to lucene [lucene]

2023-10-14 Thread via GitHub
rmuir merged PR #12667: URL: https://github.com/apache/lucene/pull/12667 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

Re: [PR] Speedup integer functions for 128-bit neon vectors [lucene]

2023-10-14 Thread via GitHub
rmuir commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1762984112 I'm gonna merge this but we should continue to explore the intel case. Not sure what we can do there though. -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] Speedup integer functions for 128-bit neon vectors [lucene]

2023-10-14 Thread via GitHub
rmuir merged PR #12632: URL: https://github.com/apache/lucene/pull/12632 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

Re: [PR] Optimize OnHeapHnswGraph's data structure [lucene]

2023-10-14 Thread via GitHub
benwtrent commented on code in PR #12651: URL: https://github.com/apache/lucene/pull/12651#discussion_r1359458954 ## lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java: ## @@ -163,45 +185,66 @@ public NodesIterator getNodesOnLevel(int level) { if (level =

Re: [PR] Speedup integer functions for 128-bit neon vectors [lucene]

2023-10-14 Thread via GitHub
benwtrent commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1762993804 Thank y'all so much for digging into this @rmuir @gf2121 @ChrisHegarty @uschindler ! Maybe one day Panama Vector will mature into allow us to do nicer things with `byte` compari

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-14 Thread via GitHub
mayya-sharipova commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1359477741 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsWriter.java: ## @@ -0,0 +1,782 @@ +/* + * Licensed to the Apache Softwa

Re: [I] segmentInfos.replace() doesn't set userData [lucene]

2023-10-14 Thread via GitHub
msfroh commented on issue #12637: URL: https://github.com/apache/lucene/issues/12637#issuecomment-1763041651 I was curious about this one, and whether it is a bug or intentional. I noticed that the `IndexWriter` constructor that calls `SegmentInfos.replace()` has a comment saying:

[PR] simple cleanups to vector code [lucene]

2023-10-14 Thread via GitHub
rmuir opened a new pull request, #12680: URL: https://github.com/apache/lucene/pull/12680 Now that we have integrated benchmarks, it is easier to take care of this code. This is pretty straightforward change: * split out vectorized loops to avoid huge methods (especially integer

Re: [I] Make `byte[]` vector comparisons faster! (if possible) [lucene]

2023-10-14 Thread via GitHub
rmuir commented on issue #12621: URL: https://github.com/apache/lucene/issues/12621#issuecomment-1763056704 From my analysis, code being generated is correct. recommend to explore half-float instead for better performance and space tradeoffs. -- This is an automated message from the Apach

Re: [I] Make `byte[]` vector comparisons faster! (if possible) [lucene]

2023-10-14 Thread via GitHub
rmuir closed issue #12621: Make `byte[]` vector comparisons faster! (if possible) URL: https://github.com/apache/lucene/issues/12621 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [I] Exception rising while using QueryTimeout [lucene]

2023-10-14 Thread via GitHub
msfroh commented on issue #12032: URL: https://github.com/apache/lucene/issues/12032#issuecomment-1763058013 I was looking into this, and the fundamental problem seems to be that the underlying drillsideways scoring implementations (`doQueryFirstScoring`, `doDrillDownAdvanceScoring`, and `d

Re: [PR] Speedup integer functions for 128-bit neon vectors [lucene]

2023-10-14 Thread via GitHub
rmuir commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1763058511 @benwtrent it isn't a panama thing. these functions are 32-bit (they return `int` and `float`). There is no hope for these getting faster, I just hope you understand that.

Re: [PR] Fix unstable test TestVectorSimilarityValuesSource [lucene]

2023-10-14 Thread via GitHub
zhaih merged PR #12678: URL: https://github.com/apache/lucene/pull/12678 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

Re: [PR] Move private static classes or functions out of DoubleValuesSource [lucene]

2023-10-14 Thread via GitHub
shubhamvishu commented on PR #12671: URL: https://github.com/apache/lucene/pull/12671#issuecomment-1763076574 Thanks @gsmiller for the review! My motivation behind this refactoring was [this comment](https://github.com/apache/lucene/pull/12548#discussion_r1357027508) from Mike which indica

Re: [PR] Move private static classes or functions out of DoubleValuesSource [lucene]

2023-10-14 Thread via GitHub
shubhamvishu commented on code in PR #12671: URL: https://github.com/apache/lucene/pull/12671#discussion_r1359519018 ## lucene/core/src/java/org/apache/lucene/search/DoubleValuesSource.java: ## @@ -43,6 +40,9 @@ * {@link #fromScorer(Scorable)} and passing the resulting DoubleV

Re: [I] segmentInfos.replace() doesn't set userData [lucene]

2023-10-14 Thread via GitHub
Shibi-bala commented on issue #12637: URL: https://github.com/apache/lucene/issues/12637#issuecomment-1763086904 Yeah exactly. I'd say `userData` isn't metadata so it should get replaced as well. -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] simple cleanups to vector code [lucene]

2023-10-14 Thread via GitHub
rmuir commented on PR #12680: URL: https://github.com/apache/lucene/pull/12680#issuecomment-1763090784 cosine() ones cleaned up now too. I don't see perf issue with the array: guess this whole shebang relies on escape analysis anyway. -- This is an automated message from the Apache Git Se

Re: [PR] Optimize OnHeapHnswGraph's data structure [lucene]

2023-10-14 Thread via GitHub
zhaih commented on code in PR #12651: URL: https://github.com/apache/lucene/pull/12651#discussion_r1359568778 ## lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java: ## @@ -40,31 +41,39 @@ public final class OnHeapHnswGraph extends HnswGraph implements Account

Re: [PR] Optimize OnHeapHnswGraph's data structure [lucene]

2023-10-14 Thread via GitHub
zhaih commented on code in PR #12651: URL: https://github.com/apache/lucene/pull/12651#discussion_r1359572233 ## lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java: ## @@ -163,45 +185,66 @@ public NodesIterator getNodesOnLevel(int level) { if (level == 0)

Re: [PR] Extract the hnsw graph merging from being part of the vector writer [lucene]

2023-10-14 Thread via GitHub
zhaih commented on code in PR #12657: URL: https://github.com/apache/lucene/pull/12657#discussion_r1359590587 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/IncrementalHnswGraphMerger.java: ## @@ -0,0 +1,189 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] Extract the hnsw graph merging from being part of the vector writer [lucene]

2023-10-14 Thread via GitHub
zhaih commented on code in PR #12657: URL: https://github.com/apache/lucene/pull/12657#discussion_r1359606481 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/IncrementalHnswGraphMerger.java: ## @@ -0,0 +1,189 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] Add support for radius-based vector searches [lucene]

2023-10-14 Thread via GitHub
shubhamvishu commented on code in PR #12679: URL: https://github.com/apache/lucene/pull/12679#discussion_r1359606449 ## lucene/core/src/java/org/apache/lucene/search/AbstractRnnVectorQuery.java: ## @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Add support for radius-based vector searches [lucene]

2023-10-14 Thread via GitHub
shubhamvishu commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1763170985 Thanks for adding this @kaivalnp! The idea makes sense to me, looking forward to the benchmarks results. I left some minor comments. Sharing some thoughts below : 1. Is it ri

Re: [PR] Add support for radius-based vector searches [lucene]

2023-10-14 Thread via GitHub
shubhamvishu commented on code in PR #12679: URL: https://github.com/apache/lucene/pull/12679#discussion_r1359624510 ## lucene/core/src/java/org/apache/lucene/search/AbstractRnnVectorQuery.java: ## Review Comment: Lets add some tests for these going forward? -- This is

[PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-14 Thread via GitHub
rmuir opened a new pull request, #12681: URL: https://github.com/apache/lucene/pull/12681 This builds on https://github.com/apache/lucene/pull/12680 so please review that one first to make it easier. The advantage there is we split out vector kernels into smaller manageable methods, making

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-14 Thread via GitHub
rmuir commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763173911 Here's the diff of just the commit for this change: https://github.com/apache/lucene/pull/12681/commits/3ec9c26d672262762f4213c827699bf735409eeb -- This is an automated message from the

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-14 Thread via GitHub
uschindler commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763180967 > We have to think about testing. I don't want to rely upon various hardware for correctness. I think there's a way to alter the code so that we can test the correctness of everything

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-14 Thread via GitHub
rmuir commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763190528 > This can be done in the same way like the "testMode" flag, we should just extend it to cover more cases. You could also pass an override for the bit size instead of true/false. >

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-14 Thread via GitHub
rmuir commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763199813 I tried it out, making species `final` instead of `static final`. performance completely falls apart, slower than scalar impl even. it is a non-option... We should keep everything here sta

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-14 Thread via GitHub
rmuir commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763212967 @uschindler I did the 'fast integer vectors' override differently, and configured the build to randomize the vector size used for testing. So it still does the same thing it was doin

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-14 Thread via GitHub
rmuir commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763217937 for me, when investigating a modification, this works easily enough: ```console $ for bits in 128 256 512; do ./gradlew -p lucene/core test --tests TestVectorUtilSupport -Dtests.

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-14 Thread via GitHub
rmuir commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763220499 This also makes this test reproducible from random seed regardless of the hardware, as `SPECIES_PREFERRED` is not used at all in tests. From a test perspective, it is like a forbidden-api.

[PR] Scorer's should sum up scores into a double [lucene]

2023-10-14 Thread via GitHub
shubhamvishu opened a new pull request, #12682: URL: https://github.com/apache/lucene/pull/12682 ### Description Addresses #12675 . Along with `MultiSimilarity.MultiSimScorer` found some others candidate scorer implementations for this fix. -- This is an automated message f

Re: [I] MultiSimilarity.MultiSimScorer should sum up scores into a double [lucene]

2023-10-14 Thread via GitHub
shubhamvishu commented on issue #12675: URL: https://github.com/apache/lucene/issues/12675#issuecomment-1763281638 @jpountz I have raised a PR #12682 with the fix to `MultiSimilarity.MultiSimScorer` and some other candidate scorers I could find with similar issue. -- This is an automated

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
uschindler commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763331631 > This also makes this test reproducible from random seed regardless of the hardware, as `SPECIES_PREFERRED` is not used at all in tests. From a test perspective, it is like a forbidd

Re: [PR] Better detect vector module in non-default setups (e.g., custom module layers) [lucene]

2023-10-15 Thread via GitHub
ChrisHegarty commented on code in PR #12677: URL: https://github.com/apache/lucene/pull/12677#discussion_r1359850988 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java: ## @@ -120,24 +124,21 @@ static VectorizationProvider lookup(boolean

Re: [PR] Better detect vector module in non-default setups (e.g., custom module layers) [lucene]

2023-10-15 Thread via GitHub
uschindler commented on code in PR #12677: URL: https://github.com/apache/lucene/pull/12677#discussion_r1359852363 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java: ## @@ -120,24 +124,21 @@ static VectorizationProvider lookup(boolean te

Re: [PR] Better detect vector module in non-default setups (e.g., custom module layers) [lucene]

2023-10-15 Thread via GitHub
uschindler merged PR #12677: URL: https://github.com/apache/lucene/pull/12677 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Better detect vector module in non-default setups (e.g., custom module layers) [lucene]

2023-10-15 Thread via GitHub
ChrisHegarty commented on code in PR #12677: URL: https://github.com/apache/lucene/pull/12677#discussion_r1359852957 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java: ## @@ -120,24 +124,21 @@ static VectorizationProvider lookup(boolean

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
ChrisHegarty commented on code in PR #12681: URL: https://github.com/apache/lucene/pull/12681#discussion_r1359857545 ## lucene/core/src/java20/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java: ## @@ -16,165 +16,155 @@ */ package org.apache.lucene.internal

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
ChrisHegarty commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763359841 > for me, when investigating a modification, this works easily enough: > > ``` > $ for bits in 128 256 512; do ./gradlew -p lucene/core test --tests TestVectorUtilSupport -

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
uschindler commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763361313 > > This also makes this test reproducible from random seed regardless of the hardware, as `SPECIES_PREFERRED` is not used at all in tests. From a test perspective, it is like a forbi

Re: [PR] Extract the hnsw graph merging from being part of the vector writer [lucene]

2023-10-15 Thread via GitHub
benwtrent commented on code in PR #12657: URL: https://github.com/apache/lucene/pull/12657#discussion_r1359865451 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/IncrementalHnswGraphMerger.java: ## @@ -0,0 +1,189 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] simple cleanups to vector code [lucene]

2023-10-15 Thread via GitHub
rmuir merged PR #12680: URL: https://github.com/apache/lucene/pull/12680 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
uschindler commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763370765 In addition to that, maybe we should change TestVectorUtilSupport to spawn 3 child JVMs and don't set the property top-level? The problem is currently any of those: - If we e

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
uschindler commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763371401 Otherwise this is a great improvement for testing, we just need to fine-tune it! ❤️ -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
rmuir commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763371551 I don't mind if sysprop code moves, as long as you can ensure no classloading deadlocks -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
uschindler commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763372042 > I don't mind if sysprop code moves, as long as you can ensure no classloading deadlocks No classloading deadlock possible. We have a Holder for that already: https://github.c

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
rmuir commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763372114 wow, merge conflicts, really? git is terrible sometimes. literally didn't merge anything that isn't in this branch already. -- This is an automated message from the Apache Git Service. T

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
rmuir commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763373537 I understand the git issue now. Why do we configure the merge button as "squash"? I'd like to default it to "merge" because I value my sanity! -- This is an automated message from the Ap

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
uschindler commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763377432 > I understand the git issue now. Why do we configure the merge button as "squash"? I'd like to default it to "merge" because I value my sanity! At least we should have the opti

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
rmuir commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763378070 OK, i just wanted to do "enough" on the tests to make some progress. If we just add _128, _256, _512 variants without addressing the testing, then we create a potential problem. I w

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
uschindler commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763378449 > and hope to avoid more conflicts (it is a conflict nightmare...) This is why I tend to make one PR after the other. So I don't want to start another test cleanup one before th

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
rmuir commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763378774 > At least we should have the option to select both variants! Yeah thats strange, It doesnt make sense to remove the option from the merge button. I will use "git push" to workaround

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
asfgit merged PR #12681: URL: https://github.com/apache/lucene/pull/12681 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
rmuir commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763380049 ok, all done. "asfgit" can do proper merge commit :) thanks @uschindler ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] Avoid use docsSeen in BKDWriter [lucene]

2023-10-15 Thread via GitHub
easyice commented on code in PR #12658: URL: https://github.com/apache/lucene/pull/12658#discussion_r1357058015 ## lucene/core/src/java/org/apache/lucene/util/bkd/BKDWriter.java: ## @@ -519,9 +526,8 @@ private Runnable writeFieldNDims( // compute the min/max for this slice

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
uschindler commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763386185 Will you backport? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
uschindler commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763399661 > ok, all done. "asfgit" can do proper merge commit :) thanks @uschindler ! The only downside is that backporting gets harder. So squashed PRs are easier to handle in that regar

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
rmuir commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763402251 I haven't been backporting recent vector changes as they are fairly aggressive. That's just my thoughts. If we want to do that, we have to do it proper and backport test, benchmarks, loggi

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
rmuir commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763402811 > > ok, all done. "asfgit" can do proper merge commit :) thanks @uschindler ! > > The only downside is that backporting gets harder. So squashed PRs are easier to handle in that rega

Re: [PR] Optimize OnHeapHnswGraph's data structure [lucene]

2023-10-15 Thread via GitHub
msokolov commented on code in PR #12651: URL: https://github.com/apache/lucene/pull/12651#discussion_r1359884135 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java: ## @@ -284,6 +285,13 @@ int graphNextNeighbor(HnswGraph graph) throws IOException { r

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
uschindler commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763426882 > > > ok, all done. "asfgit" can do proper merge commit :) thanks @uschindler ! > > > > > > The only downside is that backporting gets harder. So squashed PRs are easier to

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
uschindler commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763454286 I can take care of backporting, also your previous commits. For maintenenace it would be good as the code differs dramatically now between main and 9.x and makes merging hard. -- T

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-15 Thread via GitHub
uschindler commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763462800 I backported this one to 9.x. For that I selected the separate commits, cherrypicked and squashed them with my tortoise GUI (sorry!). -- This is an automated message from the Apache

Re: [PR] simple cleanups to vector code [lucene]

2023-10-15 Thread via GitHub
uschindler commented on PR #12680: URL: https://github.com/apache/lucene/pull/12680#issuecomment-1763463280 I backported this one to 9.x. As I cleaned up commits, this one was committed in 9.x as part of #12681 -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] Speedup integer functions for 128-bit neon vectors [lucene]

2023-10-15 Thread via GitHub
uschindler commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1763463569 I backported this one to 9.x. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] add tests for vectorutils integer boundaries [lucene]

2023-10-15 Thread via GitHub
uschindler commented on PR #12634: URL: https://github.com/apache/lucene/pull/12634#issuecomment-1763463645 I backported this one to 9.x. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[PR] [BROKEN, for reference only] concurrent hnsw [lucene]

2023-10-15 Thread via GitHub
msokolov opened a new pull request, #12683: URL: https://github.com/apache/lucene/pull/12683 ### Description adds row locks to OnHeapHnswGraph seems to be broken in the way it adds new nodes to upper levels such that they become discoverable before they have neighbors? I

Re: [PR] [DRAFT] Concurrent HNSW Merge [lucene]

2023-10-15 Thread via GitHub
msokolov commented on PR #12660: URL: https://github.com/apache/lucene/pull/12660#issuecomment-1763471838 OK, I posted https://github.com/apache/lucene/pull/12683. I'm curious about the performance measurements. We want to know how much lock contention there is so ideally we'd compare total

Re: [PR] Extract the hnsw graph merging from being part of the vector writer [lucene]

2023-10-15 Thread via GitHub
msokolov commented on code in PR #12657: URL: https://github.com/apache/lucene/pull/12657#discussion_r1359929577 ## lucene/core/src/java/org/apache/lucene/util/hnsw/InitializedHnswGraphBuilder.java: ## @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

[PR] Fix jacoco coverage tests (add createClassLoader to replicator permissions) [lucene]

2023-10-15 Thread via GitHub
dweiss opened a new pull request, #12684: URL: https://github.com/apache/lucene/pull/12684 Code coverage tests have been failing with an exception thrown from jacoco's premain: ``` Caused by: java.security.AccessControlException: access denied ("java.lang.RuntimePermission" "createCla

Re: [PR] Optimize OnHeapHnswGraph's data structure [lucene]

2023-10-15 Thread via GitHub
zhaih commented on code in PR #12651: URL: https://github.com/apache/lucene/pull/12651#discussion_r1359984155 ## lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java: ## @@ -74,23 +88,32 @@ public final class OnHeapHnswGraph extends HnswGraph implements Account

Re: [PR] Optimize OnHeapHnswGraph's data structure [lucene]

2023-10-15 Thread via GitHub
zhaih commented on code in PR #12651: URL: https://github.com/apache/lucene/pull/12651#discussion_r1359984597 ## lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java: ## @@ -99,27 +122,31 @@ public void addNode(int level, int node) { entryNode = node;

Re: [PR] Optimize OnHeapHnswGraph's data structure [lucene]

2023-10-15 Thread via GitHub
zhaih commented on code in PR #12651: URL: https://github.com/apache/lucene/pull/12651#discussion_r1359985681 ## lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java: ## @@ -158,50 +185,82 @@ public int entryNode() { return entryNode; } + /** + * WA

Re: [I] Make OrdinalMap maps docID to global ordinal directly? [lucene]

2023-10-15 Thread via GitHub
vsop-479 commented on issue #12669: URL: https://github.com/apache/lucene/issues/12669#issuecomment-1763633876 > need to store maxDoc global ordinals (one per doc) instead of valueCount global ordinals. @jpountz Thanks for remind that. I will close this issue. -- This is an automa

Re: [I] Make OrdinalMap maps docID to global ordinal directly? [lucene]

2023-10-15 Thread via GitHub
vsop-479 closed issue #12669: Make OrdinalMap maps docID to global ordinal directly? URL: https://github.com/apache/lucene/issues/12669 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] Fix segmentInfos replace doesn't set userData [lucene]

2023-10-15 Thread via GitHub
MarcusSorealheis commented on PR #12626: URL: https://github.com/apache/lucene/pull/12626#issuecomment-1763688716 Hi @Shibi-bala and great to see you here. Let's sync up this week and maybe we can help move this PR forward. It's a good catch, so thank you. -- This is an automated message

Re: [PR] [DRAFT] Concurrent HNSW Merge [lucene]

2023-10-15 Thread via GitHub
zhaih commented on PR #12660: URL: https://github.com/apache/lucene/pull/12660#issuecomment-1763695626 Thanks @msokolov, I'll take a look soon. > so ideally we'd compare total time to merge using single-threaded vs using this. The most fair comparison here I posted might be the

Re: [PR] Optimize OnHeapHnswGraph's data structure [lucene]

2023-10-15 Thread via GitHub
zhaih commented on code in PR #12651: URL: https://github.com/apache/lucene/pull/12651#discussion_r1360084970 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java: ## @@ -284,6 +285,13 @@ int graphNextNeighbor(HnswGraph graph) throws IOException { retu

Re: [PR] Optimize OnHeapHnswGraph's data structure [lucene]

2023-10-15 Thread via GitHub
zhaih commented on code in PR #12651: URL: https://github.com/apache/lucene/pull/12651#discussion_r1360085215 ## lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java: ## @@ -158,50 +185,82 @@ public int entryNode() { return entryNode; } + /** + * WA

Re: [PR] Optimize OnHeapHnswGraph's data structure [lucene]

2023-10-15 Thread via GitHub
zhaih commented on code in PR #12651: URL: https://github.com/apache/lucene/pull/12651#discussion_r1360085712 ## lucene/core/src/test/org/apache/lucene/util/hnsw/HnswGraphTestCase.java: ## @@ -454,77 +454,6 @@ public void testSearchWithSelectiveAcceptOrds() throws IOException {

Re: [PR] Optimize OnHeapHnswGraph's data structure [lucene]

2023-10-15 Thread via GitHub
zhaih commented on PR #12651: URL: https://github.com/apache/lucene/pull/12651#issuecomment-1763700781 I reran'd the benchmark and still get the similar perf and same recall. (Just to make sure the later edits have not messed up things) -- This is an automated message from the Apache Git

Re: [I] segmentInfos.replace() doesn't set userData [lucene]

2023-10-15 Thread via GitHub
Shibi-bala commented on issue #12637: URL: https://github.com/apache/lucene/issues/12637#issuecomment-1763742706 @msfroh this impacts the ability to snapshot since you can't read old `userData`. Check out the test in my PR: https://github.com/apache/lucene/pull/12626 -- This is an automa

Re: [PR] Fix jacoco coverage tests (add createClassLoader to replicator permissions) [lucene]

2023-10-16 Thread via GitHub
dweiss merged PR #12684: URL: https://github.com/apache/lucene/pull/12684 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] Extract the hnsw graph merging from being part of the vector writer [lucene]

2023-10-16 Thread via GitHub
benwtrent commented on code in PR #12657: URL: https://github.com/apache/lucene/pull/12657#discussion_r1360502250 ## lucene/core/src/java/org/apache/lucene/util/hnsw/InitializedHnswGraphBuilder.java: ## @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

Re: [PR] Record if block API has been used in SegmentsInfo [lucene]

2023-10-16 Thread via GitHub
s1monw commented on code in PR #12685: URL: https://github.com/apache/lucene/pull/12685#discussion_r1360623450 ## lucene/core/src/java/org/apache/lucene/index/IndexWriter.java: ## @@ -3368,9 +3368,15 @@ public void addIndexesReaderMerge(MergePolicy.OneMerge merge) throws IOExce

Re: [PR] Record if block API has been used in SegmentsInfo [lucene]

2023-10-16 Thread via GitHub
s1monw commented on code in PR #12685: URL: https://github.com/apache/lucene/pull/12685#discussion_r1360627901 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99SegmentInfoFormat.java: ## @@ -0,0 +1,236 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-10-16 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1360701669 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -21,19 +21,18 @@ import java.util.List; import org.apache.lucene.store.DataInput; impor

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-10-16 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1360715823 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -21,19 +21,18 @@ import java.util.List; import org.apache.lucene.store.DataInput; impor

<    6   7   8   9   10   11   12   13   14   15   >