Re: [I] MultiSimilarity.MultiSimScorer should sum up scores into a double [lucene]

2023-10-14 Thread via GitHub
shubhamvishu commented on issue #12675: URL: https://github.com/apache/lucene/issues/12675#issuecomment-1763281638 @jpountz I have raised a PR #12682 with the fix to `MultiSimilarity.MultiSimScorer` and some other candidate scorers I could find with similar issue. -- This is an automated

[PR] Scorer's should sum up scores into a double [lucene]

2023-10-14 Thread via GitHub
shubhamvishu opened a new pull request, #12682: URL: https://github.com/apache/lucene/pull/12682 ### Description Addresses #12675 . Along with `MultiSimilarity.MultiSimScorer` found some others candidate scorer implementations for this fix. -- This is an automated message f

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-14 Thread via GitHub
rmuir commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763220499 This also makes this test reproducible from random seed regardless of the hardware, as `SPECIES_PREFERRED` is not used at all in tests. From a test perspective, it is like a forbidden-api.

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-14 Thread via GitHub
rmuir commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763217937 for me, when investigating a modification, this works easily enough: ```console $ for bits in 128 256 512; do ./gradlew -p lucene/core test --tests TestVectorUtilSupport -Dtests.

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-14 Thread via GitHub
rmuir commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763212967 @uschindler I did the 'fast integer vectors' override differently, and configured the build to randomize the vector size used for testing. So it still does the same thing it was doin

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-14 Thread via GitHub
rmuir commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763199813 I tried it out, making species `final` instead of `static final`. performance completely falls apart, slower than scalar impl even. it is a non-option... We should keep everything here sta

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-14 Thread via GitHub
rmuir commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763190528 > This can be done in the same way like the "testMode" flag, we should just extend it to cover more cases. You could also pass an override for the bit size instead of true/false. >

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-14 Thread via GitHub
uschindler commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763180967 > We have to think about testing. I don't want to rely upon various hardware for correctness. I think there's a way to alter the code so that we can test the correctness of everything

Re: [PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-14 Thread via GitHub
rmuir commented on PR #12681: URL: https://github.com/apache/lucene/pull/12681#issuecomment-1763173911 Here's the diff of just the commit for this change: https://github.com/apache/lucene/pull/12681/commits/3ec9c26d672262762f4213c827699bf735409eeb -- This is an automated message from the

[PR] speedup all binary functions on avx256, speedup binary square on avx512 [lucene]

2023-10-14 Thread via GitHub
rmuir opened a new pull request, #12681: URL: https://github.com/apache/lucene/pull/12681 This builds on https://github.com/apache/lucene/pull/12680 so please review that one first to make it easier. The advantage there is we split out vector kernels into smaller manageable methods, making

Re: [PR] Add support for radius-based vector searches [lucene]

2023-10-14 Thread via GitHub
shubhamvishu commented on code in PR #12679: URL: https://github.com/apache/lucene/pull/12679#discussion_r1359624510 ## lucene/core/src/java/org/apache/lucene/search/AbstractRnnVectorQuery.java: ## Review Comment: Lets add some tests for these going forward? -- This is

Re: [PR] Add support for radius-based vector searches [lucene]

2023-10-14 Thread via GitHub
shubhamvishu commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1763170985 Thanks for adding this @kaivalnp! The idea makes sense to me, looking forward to the benchmarks results. I left some minor comments. Sharing some thoughts below : 1. Is it ri

Re: [PR] Add support for radius-based vector searches [lucene]

2023-10-14 Thread via GitHub
shubhamvishu commented on code in PR #12679: URL: https://github.com/apache/lucene/pull/12679#discussion_r1359606449 ## lucene/core/src/java/org/apache/lucene/search/AbstractRnnVectorQuery.java: ## @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Extract the hnsw graph merging from being part of the vector writer [lucene]

2023-10-14 Thread via GitHub
zhaih commented on code in PR #12657: URL: https://github.com/apache/lucene/pull/12657#discussion_r1359606481 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/IncrementalHnswGraphMerger.java: ## @@ -0,0 +1,189 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] Extract the hnsw graph merging from being part of the vector writer [lucene]

2023-10-14 Thread via GitHub
zhaih commented on code in PR #12657: URL: https://github.com/apache/lucene/pull/12657#discussion_r1359590587 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/IncrementalHnswGraphMerger.java: ## @@ -0,0 +1,189 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] Optimize OnHeapHnswGraph's data structure [lucene]

2023-10-14 Thread via GitHub
zhaih commented on code in PR #12651: URL: https://github.com/apache/lucene/pull/12651#discussion_r1359572233 ## lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java: ## @@ -163,45 +185,66 @@ public NodesIterator getNodesOnLevel(int level) { if (level == 0)

Re: [PR] Optimize OnHeapHnswGraph's data structure [lucene]

2023-10-14 Thread via GitHub
zhaih commented on code in PR #12651: URL: https://github.com/apache/lucene/pull/12651#discussion_r1359568778 ## lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java: ## @@ -40,31 +41,39 @@ public final class OnHeapHnswGraph extends HnswGraph implements Account

Re: [PR] simple cleanups to vector code [lucene]

2023-10-14 Thread via GitHub
rmuir commented on PR #12680: URL: https://github.com/apache/lucene/pull/12680#issuecomment-1763090784 cosine() ones cleaned up now too. I don't see perf issue with the array: guess this whole shebang relies on escape analysis anyway. -- This is an automated message from the Apache Git Se

Re: [I] segmentInfos.replace() doesn't set userData [lucene]

2023-10-14 Thread via GitHub
Shibi-bala commented on issue #12637: URL: https://github.com/apache/lucene/issues/12637#issuecomment-1763086904 Yeah exactly. I'd say `userData` isn't metadata so it should get replaced as well. -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] Move private static classes or functions out of DoubleValuesSource [lucene]

2023-10-14 Thread via GitHub
shubhamvishu commented on code in PR #12671: URL: https://github.com/apache/lucene/pull/12671#discussion_r1359519018 ## lucene/core/src/java/org/apache/lucene/search/DoubleValuesSource.java: ## @@ -43,6 +40,9 @@ * {@link #fromScorer(Scorable)} and passing the resulting DoubleV

Re: [PR] Move private static classes or functions out of DoubleValuesSource [lucene]

2023-10-14 Thread via GitHub
shubhamvishu commented on PR #12671: URL: https://github.com/apache/lucene/pull/12671#issuecomment-1763076574 Thanks @gsmiller for the review! My motivation behind this refactoring was [this comment](https://github.com/apache/lucene/pull/12548#discussion_r1357027508) from Mike which indica

Re: [PR] Fix unstable test TestVectorSimilarityValuesSource [lucene]

2023-10-14 Thread via GitHub
zhaih merged PR #12678: URL: https://github.com/apache/lucene/pull/12678 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

Re: [PR] Speedup integer functions for 128-bit neon vectors [lucene]

2023-10-14 Thread via GitHub
rmuir commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1763058511 @benwtrent it isn't a panama thing. these functions are 32-bit (they return `int` and `float`). There is no hope for these getting faster, I just hope you understand that.

Re: [I] Exception rising while using QueryTimeout [lucene]

2023-10-14 Thread via GitHub
msfroh commented on issue #12032: URL: https://github.com/apache/lucene/issues/12032#issuecomment-1763058013 I was looking into this, and the fundamental problem seems to be that the underlying drillsideways scoring implementations (`doQueryFirstScoring`, `doDrillDownAdvanceScoring`, and `d

Re: [I] Make `byte[]` vector comparisons faster! (if possible) [lucene]

2023-10-14 Thread via GitHub
rmuir closed issue #12621: Make `byte[]` vector comparisons faster! (if possible) URL: https://github.com/apache/lucene/issues/12621 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [I] Make `byte[]` vector comparisons faster! (if possible) [lucene]

2023-10-14 Thread via GitHub
rmuir commented on issue #12621: URL: https://github.com/apache/lucene/issues/12621#issuecomment-1763056704 From my analysis, code being generated is correct. recommend to explore half-float instead for better performance and space tradeoffs. -- This is an automated message from the Apach

[PR] simple cleanups to vector code [lucene]

2023-10-14 Thread via GitHub
rmuir opened a new pull request, #12680: URL: https://github.com/apache/lucene/pull/12680 Now that we have integrated benchmarks, it is easier to take care of this code. This is pretty straightforward change: * split out vectorized loops to avoid huge methods (especially integer

Re: [I] segmentInfos.replace() doesn't set userData [lucene]

2023-10-14 Thread via GitHub
msfroh commented on issue #12637: URL: https://github.com/apache/lucene/issues/12637#issuecomment-1763041651 I was curious about this one, and whether it is a bug or intentional. I noticed that the `IndexWriter` constructor that calls `SegmentInfos.replace()` has a comment saying:

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-14 Thread via GitHub
mayya-sharipova commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1359477741 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsWriter.java: ## @@ -0,0 +1,782 @@ +/* + * Licensed to the Apache Softwa

Re: [PR] Speedup integer functions for 128-bit neon vectors [lucene]

2023-10-14 Thread via GitHub
benwtrent commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1762993804 Thank y'all so much for digging into this @rmuir @gf2121 @ChrisHegarty @uschindler ! Maybe one day Panama Vector will mature into allow us to do nicer things with `byte` compari

Re: [PR] Optimize OnHeapHnswGraph's data structure [lucene]

2023-10-14 Thread via GitHub
benwtrent commented on code in PR #12651: URL: https://github.com/apache/lucene/pull/12651#discussion_r1359458954 ## lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java: ## @@ -163,45 +185,66 @@ public NodesIterator getNodesOnLevel(int level) { if (level =

Re: [PR] Speedup integer functions for 128-bit neon vectors [lucene]

2023-10-14 Thread via GitHub
rmuir merged PR #12632: URL: https://github.com/apache/lucene/pull/12632 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

Re: [PR] Speedup integer functions for 128-bit neon vectors [lucene]

2023-10-14 Thread via GitHub
rmuir commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1762984112 I'm gonna merge this but we should continue to explore the intel case. Not sure what we can do there though. -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] migrate all vectorbench methods to lucene [lucene]

2023-10-14 Thread via GitHub
rmuir merged PR #12667: URL: https://github.com/apache/lucene/pull/12667 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

Re: [I] [DISCUSS] Should there be a threshold-based vector search API? [lucene]

2023-10-14 Thread via GitHub
benwtrent commented on issue #12579: URL: https://github.com/apache/lucene/issues/12579#issuecomment-1762970584 @kaivalnp one other thing to think about is https://weaviate.io/blog/weaviate-1-20-release#autocut I wonder if we could do something similar by dynamically adjusting the "t

Re: [I] [DISCUSS] Should there be a threshold-based vector search API? [lucene]

2023-10-14 Thread via GitHub
benwtrent commented on issue #12579: URL: https://github.com/apache/lucene/issues/12579#issuecomment-1762966693 @kaivalnp yes, `KnnCollector` should be used for something like this :). Glad its useful! One of the tricky things I can see is that its possible that the bottom layer entr

Re: [PR] Move private static classes or functions out of DoubleValuesSource [lucene]

2023-10-14 Thread via GitHub
gsmiller commented on code in PR #12671: URL: https://github.com/apache/lucene/pull/12671#discussion_r1359402306 ## lucene/core/src/java/org/apache/lucene/search/VectorSimilarityValuesSource.java: ## @@ -32,6 +33,52 @@ public VectorSimilarityValuesSource(String fieldName) {

[PR] Add support for radius-based vector searches [lucene]

2023-10-14 Thread via GitHub
kaivalnp opened a new pull request, #12679: URL: https://github.com/apache/lucene/pull/12679 ### Description Background in #12579 Add support for getting "all vectors within a radius" as opposed to getting the "topK closest vectors" in the current system ### Consideratio

Re: [I] [DISCUSS] Should there be a threshold-based vector search API? [lucene]

2023-10-14 Thread via GitHub
kaivalnp commented on issue #12579: URL: https://github.com/apache/lucene/issues/12579#issuecomment-1762822602 Thanks @msokolov, this nicely summarizes what I'm trying to say! > https://typesense.org/docs/0.25.0/api/vector-search.html#distance-threshold I took a look here: and [

Re: [I] Multiple ClassNotFoundExceptions in IntelliJ Fat Jar on ARM64 Java 20 [lucene]

2023-10-14 Thread via GitHub
uschindler commented on issue #12307: URL: https://github.com/apache/lucene/issues/12307#issuecomment-1762822051 If you want to create a classical classpath application that can be started with `java -jar application.jar` the correct way is to *NOT* package everything into a fat `applicatio

Re: [I] Multiple ClassNotFoundExceptions in IntelliJ Fat Jar on ARM64 Java 20 [lucene]

2023-10-14 Thread via GitHub
uschindler commented on issue #12307: URL: https://github.com/apache/lucene/issues/12307#issuecomment-1762818363 > @uschindler If fat JARs are not supported or recommended with Lucene, what _is_ the recommended way to deploy a project incorporating Lucene? I cannot find any resources on thi

Re: [PR] Better detect vector module in non-default setups (e.g., custom module layers) [lucene]

2023-10-14 Thread via GitHub
uschindler commented on code in PR #12677: URL: https://github.com/apache/lucene/pull/12677#discussion_r1359206745 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java: ## @@ -120,24 +122,22 @@ static VectorizationProvider lookup(boolean te

Re: [PR] Better detect vector module in non-default setups (e.g., custom module layers) [lucene]

2023-10-14 Thread via GitHub
uschindler commented on code in PR #12677: URL: https://github.com/apache/lucene/pull/12677#discussion_r1359206745 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java: ## @@ -120,24 +122,22 @@ static VectorizationProvider lookup(boolean te

Re: [PR] Better detect vector module in non-default setups (e.g., custom module layers) [lucene]

2023-10-14 Thread via GitHub
dweiss commented on code in PR #12677: URL: https://github.com/apache/lucene/pull/12677#discussion_r1359206557 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java: ## @@ -120,24 +122,22 @@ static VectorizationProvider lookup(boolean testMo