[GitHub] [lucene] gf2121 opened a new pull request, #12324: Speed up IndexedDISI Sparse #AdvanceExactWithinBlock for tiny step advance

2023-05-22 Thread via GitHub
gf2121 opened a new pull request, #12324: URL: https://github.com/apache/lucene/pull/12324 Today `Sparse#AdvanceExactWithinBlock` always need to read next doc and seek back if a doc not exists. This could do harm to performance in dense hit queries. For example, a field exists in do

[GitHub] [lucene] bruno-roustant commented on issue #12309: Move aKNN limits enforcement into the default Codec's KnnVectorsFormat implementation

2023-05-22 Thread via GitHub
bruno-roustant commented on issue #12309: URL: https://github.com/apache/lucene/issues/12309#issuecomment-1556776510 In the same work, or in a separate work, we could create the extension of the HNSW implementation in the codecs package to provide it to users, so they don't have to have the

[GitHub] [lucene] alessandrobenedetti commented on pull request #12314: Multi-value support for KnnVectorField

2023-05-22 Thread via GitHub
alessandrobenedetti commented on PR #12314: URL: https://github.com/apache/lucene/pull/12314#issuecomment-1556882171 Hi > @alessandrobenedetti thank you for kick starting this! > > You are absolutely correct, this is a large, but pivotal and necessary change for vector search i

[GitHub] [lucene] hydrogen666 commented on pull request #11998: Migrate away from per-segment-per-threadlocals on SegmentReader

2023-05-22 Thread via GitHub
hydrogen666 commented on PR #11998: URL: https://github.com/apache/lucene/pull/11998#issuecomment-1556962751 In previous version `StoredFieldsReader` is cached in `ThreadLocal`, but now we need to `clone` `StoredFieldsReader` every time if we need to visit store fields. Will this PR cause a

[GitHub] [lucene] rafalh commented on issue #10309: Blended queries with boolean rewrite can result in inconsistent scores [LUCENE-9269]

2023-05-22 Thread via GitHub
rafalh commented on issue #10309: URL: https://github.com/apache/lucene/issues/10309#issuecomment-1556991229 I recently encountered this issue when migrating from Solr 7.x to 9.x. For me it caused wrong scores for a query that consists of exact match with a boost ORed with a fuzzy match, e.

[GitHub] [lucene] clayburn closed pull request #12266: Capture build scans on ge.apache.org to benefit from deep build insights

2023-05-22 Thread via GitHub
clayburn closed pull request #12266: Capture build scans on ge.apache.org to benefit from deep build insights URL: https://github.com/apache/lucene/pull/12266 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [lucene] hydrogen666 commented on pull request #11998: Migrate away from per-segment-per-threadlocals on SegmentReader

2023-05-22 Thread via GitHub
hydrogen666 commented on PR #11998: URL: https://github.com/apache/lucene/pull/11998#issuecomment-1557118792 > In previous version, `StoredFieldsReader` is cached in `ThreadLocal`, but now we need to `clone` `StoredFieldsReader` every time if we need to visit store fields. Will this PR caus

[GitHub] [lucene] uschindler commented on a diff in pull request #12293: Capture build scans on ge.apache.org to benefit from deep build insights

2023-05-22 Thread via GitHub
uschindler commented on code in PR #12293: URL: https://github.com/apache/lucene/pull/12293#discussion_r1200427420 ## gradle/ge.gradle: ## @@ -0,0 +1,26 @@ +def isGithubActions = System.getenv('GITHUB_ACTIONS') != null +def isJenkins = System.getenv('JENKINS_URL') != null +def i

[GitHub] [lucene] msokolov commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-22 Thread via GitHub
msokolov commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557119416 When I had tried this before I did something like: ``` +ShortVector acc = ShortVector.zero(SHORT_SPECIES); +int l = 0; +for (; l < BYTE_SPECIES.loopBound(len);

[GitHub] [lucene] clayburn commented on a diff in pull request #12293: Capture build scans on ge.apache.org to benefit from deep build insights

2023-05-22 Thread via GitHub
clayburn commented on code in PR #12293: URL: https://github.com/apache/lucene/pull/12293#discussion_r1200451756 ## gradle/ge.gradle: ## @@ -0,0 +1,26 @@ +def isGithubActions = System.getenv('GITHUB_ACTIONS') != null +def isJenkins = System.getenv('JENKINS_URL') != null +def isC

[GitHub] [lucene] uschindler commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-22 Thread via GitHub
uschindler commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557144955 Hi, > I didn't get an anywhere with Luceneutil yet! :-( (I haven't been able to run it successfully, getting OOM errors ) Did you get the OOMs only with our vector code?

[GitHub] [lucene] uschindler commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-22 Thread via GitHub
uschindler commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557146310 @msokolov Can you assist us with how to run Mike's luceneutil bench to get best insights to vector code? The default query benchmark has no support for vectors. -- This is an autom

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-22 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557149306 > I don't have perf numbers any more - no idea whether this is better than what you have already - probably not, but it might be worth trying castShape? I'm using convertShape which

[GitHub] [lucene] uschindler commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-22 Thread via GitHub
uschindler commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557153092 Hi, > With 256 bit vectors it is fast using ByteVector.SPECIES_64, ShortVector.SPECIES_128, and IntVector.SPECIES_256 But for ARM which only has 128-bit vectors, the generic co

[GitHub] [lucene] clayburn commented on a diff in pull request #12293: Capture build scans on ge.apache.org to benefit from deep build insights

2023-05-22 Thread via GitHub
clayburn commented on code in PR #12293: URL: https://github.com/apache/lucene/pull/12293#discussion_r1200469331 ## gradle/ge.gradle: ## @@ -0,0 +1,26 @@ +def isGithubActions = System.getenv('GITHUB_ACTIONS') != null +def isJenkins = System.getenv('JENKINS_URL') != null +def isC

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-22 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557161657 yeah dunno, i have to fix my hsdis (probably wrestle the openjdk makefile and recompile it) to really see what is happening. such an annoyance! for now since it gives 4x speedup on i

[GitHub] [lucene] uschindler commented on a diff in pull request #12293: Capture build scans on ge.apache.org to benefit from deep build insights

2023-05-22 Thread via GitHub
uschindler commented on code in PR #12293: URL: https://github.com/apache/lucene/pull/12293#discussion_r1200504015 ## gradle/ge.gradle: ## @@ -0,0 +1,26 @@ +def isGithubActions = System.getenv('GITHUB_ACTIONS') != null +def isJenkins = System.getenv('JENKINS_URL') != null +def i

[GitHub] [lucene] uschindler commented on a diff in pull request #12293: Capture build scans on ge.apache.org to benefit from deep build insights

2023-05-22 Thread via GitHub
uschindler commented on code in PR #12293: URL: https://github.com/apache/lucene/pull/12293#discussion_r1200514470 ## gradle/ge.gradle: ## @@ -0,0 +1,26 @@ +def isGithubActions = System.getenv('GITHUB_ACTIONS') != null +def isJenkins = System.getenv('JENKINS_URL') != null +def i

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-22 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557241117 i sped up the binary dotproduct some for the 128-bit case by doing similar thing, using ByteVector.SPECIES_64. -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [lucene] msokolov commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-22 Thread via GitHub
msokolov commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557343456 In luceneutil there is a python script called `vector-test.py` that you can use to run performance tests for vector search. It was a little messed up; I just pushed a change to make it

[GitHub] [lucene] alessandrobenedetti commented on pull request #12314: Multi-value support for KnnVectorField

2023-05-22 Thread via GitHub
alessandrobenedetti commented on PR #12314: URL: https://github.com/apache/lucene/pull/12314#issuecomment-1557354463 I pushed a commit with the query time simplification (only MAX strategy is supported). The diff is simpler but I am not convinced it's better. I also remembered a bit mo

[GitHub] [lucene] alessandrobenedetti commented on pull request #12314: Multi-value support for KnnVectorField

2023-05-22 Thread via GitHub
alessandrobenedetti commented on PR #12314: URL: https://github.com/apache/lucene/pull/12314#issuecomment-1557458202 Adding the information that a graph was built multi-valued, in the segment allows a check at query time to differentiate the single values vs multi-valued approach. Not a

[GitHub] [lucene] ChrisHegarty commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-22 Thread via GitHub
ChrisHegarty commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557577587 @rmuir for the byte[] case, it seems to me that we want to size things so as to optimise for the ShortVector preferred species, right? which is what you seem to have done for a numb

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-22 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557616126 interesting, let me try it on my 256. if it doesn't hurt the performance (much), then let's go with it. i would prefer to have a "generalized" version like this. -- This is an automated

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-22 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557631563 Seems to take quite a hit on my 256. And I suspect if you tried to make a "512 version" of the existing code it might be much better too? ByteVector.SPECIES_128 -> ShortVector.SPECIES_256

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-22 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557635741 if you want to fix the correctness issue for the no-vectors-supported case, just add a guard that supported vector size is at least 128 bits. It must be at least 128 so that you can divide

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-22 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557639326 btw, i think there's a real bug in the SPECIES_PREFERRED stuff that makes testing such degenerate cases *really difficult*. You should be able to just pass `-XX:MaxVectorSize=8` or `-XX:Us

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-22 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557642297 @ChrisHegarty See what I mean around correctness? This is what will happen for machine with only 64-bit vectors. ``` jshell> jdk.incubator.vector.VectorShape.forBitSize(32) | Exce

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-22 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557660515 i think another approach would be to generalize the 256 algorithm (no splitting into parts) to also work with 512? No need to have a separate `if` for 512 when its the same algo i think?

[GitHub] [lucene] uschindler commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-22 Thread via GitHub
uschindler commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557691360 > btw, i think there's a real bug in the SPECIES_PREFERRED stuff that makes testing such degenerate cases _really difficult_. You should be able to just pass `-XX:MaxVectorSize=8` or

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-22 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557693789 @ChrisHegarty i pushed a commit, stealing some of your ideas there, to generalized the 256-bit algo to also (in theory) work with avx512. It causes no regression on my avx-256, I am

[GitHub] [lucene] jainankitk commented on issue #12317: Option for disabling term dictionary compression

2023-05-22 Thread via GitHub
jainankitk commented on issue #12317: URL: https://github.com/apache/lucene/issues/12317#issuecomment-1557736895 @gsmiller - Thank you for reviewing and providing your comments > it looks like you're primarily looking at an indexing-related performance issue and concerned with the mem

[GitHub] [lucene] ChrisHegarty commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-22 Thread via GitHub
ChrisHegarty commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557840103 @rmuir Testing with the latest vector bench, commit 8f25834, I see: Linux - AVX 512 ``` Benchmark (size) Mode Cnt Score Error

[GitHub] [lucene] ChrisHegarty commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-22 Thread via GitHub
ChrisHegarty commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557853835 > > btw, i think there's a real bug in the SPECIES_PREFERRED stuff that makes testing such degenerate cases _really difficult_. You should be able to just pass `-XX:MaxVectorSize=8`

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-22 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557870684 thanks for benchmarking! I will merge it into the branch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[GitHub] [lucene] uschindler commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-22 Thread via GitHub
uschindler commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557873260 > > > btw, i think there's a real bug in the SPECIES_PREFERRED stuff that makes testing such degenerate cases _really difficult_. You should be able to just pass `-XX:MaxVectorSize=8`

[GitHub] [lucene] uschindler commented on a diff in pull request #12311: Integrate the Incubating Panama Vector API

2023-05-22 Thread via GitHub
uschindler commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1201067163 ## lucene/core/src/java20/org/apache/lucene/util/VectorUtilPanamaProvider.java: ## @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [lucene] uschindler commented on a diff in pull request #12311: Integrate the Incubating Panama Vector API

2023-05-22 Thread via GitHub
uschindler commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1201067808 ## lucene/core/src/java20/org/apache/lucene/util/VectorUtilPanamaProvider.java: ## @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-22 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557981841 Working my way thru all the vector similarity functions, I pushed initial stab at the binary euclidean distance to https://github.com/rmuir/vectorbench run it with `java -jar target/vect

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-22 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557985868 and here's my m1 for this one: ``` Benchmark(size) Mode CntScore Error Units BinarySquareBenchmark.squareDistanceNew1024 thrpt5

[GitHub] [lucene] benwtrent commented on pull request #12314: Multi-value support for KnnVectorField

2023-05-22 Thread via GitHub
benwtrent commented on PR #12314: URL: https://github.com/apache/lucene/pull/12314#issuecomment-1558005668 Thank you for the simplification! I will take another look in about 2 weeks. I am on a cross country camping trip :). There is a ton of good work in this PR. Excited to get this

[GitHub] [lucene] rmuir commented on a diff in pull request #12311: Integrate the Incubating Panama Vector API

2023-05-22 Thread via GitHub
rmuir commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1201114747 ## lucene/core/src/java20/org/apache/lucene/util/VectorUtilPanamaProvider.java: ## @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one o

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-22 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1558073763 I pushed a BinaryCosine benchmark as well, also similar stuff, just a more complex formula: mac m1: ``` Benchmark(size) Mode Cnt Score Err

[GitHub] [lucene] rmuir commented on a diff in pull request #12311: Integrate the Incubating Panama Vector API

2023-05-22 Thread via GitHub
rmuir commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1201265341 ## gradle/testing/defaults-tests.gradle: ## @@ -119,11 +119,16 @@ allprojects { if (rootProject.runtimeJavaVersion < JavaVersion.VERSION_16) { jvmArgs '-

[GitHub] [lucene] rmuir commented on a diff in pull request #12311: Integrate the Incubating Panama Vector API

2023-05-22 Thread via GitHub
rmuir commented on code in PR #12311: URL: https://github.com/apache/lucene/pull/12311#discussion_r1201270494 ## gradle/testing/defaults-tests.gradle: ## @@ -119,11 +119,16 @@ allprojects { if (rootProject.runtimeJavaVersion < JavaVersion.VERSION_16) { jvmArgs '-

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-22 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1558205450 I pushed a float euclidean benchmark (FloatSquareBenchmark). same shape as the dotproduct float, no surprises: skylake: ``` Benchmark (size) Mode Cnt

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-22 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1558282897 ok last function done (FloatCosineBenchmark). again no surprises here: skylake: ``` Benchmark (size) Mode CntScoreError Units FloatCosineBe

[GitHub] [lucene] rmuir commented on pull request #12311: Integrate the Incubating Panama Vector API

2023-05-22 Thread via GitHub
rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1558372829 Here's a summary of where the perf sits for these various functions on my machines. It only takes 5 minutes to run a pass just for vector size of 1024 dimensions only to get an idea

[GitHub] [lucene-jira-archive] dependabot[bot] opened a new pull request, #150: Bump requests from 2.28.0 to 2.31.0 in /migration

2023-05-22 Thread via GitHub
dependabot[bot] opened a new pull request, #150: URL: https://github.com/apache/lucene-jira-archive/pull/150 Bumps [requests](https://github.com/psf/requests) from 2.28.0 to 2.31.0. Release notes Sourced from https://github.com/psf/requests/releases";>requests's releases. v2