dependabot[bot] opened a new pull request, #150:
URL: https://github.com/apache/lucene-jira-archive/pull/150
Bumps [requests](https://github.com/psf/requests) from 2.28.0 to 2.31.0.
Release notes
Sourced from https://github.com/psf/requests/releases";>requests's
releases.
v2
rmuir commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1558372829
Here's a summary of where the perf sits for these various functions on my
machines.
It only takes 5 minutes to run a pass just for vector size of 1024
dimensions only to get an idea
rmuir commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1558282897
ok last function done (FloatCosineBenchmark). again no surprises here:
skylake:
```
Benchmark (size) Mode CntScoreError Units
FloatCosineBe
rmuir commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1558205450
I pushed a float euclidean benchmark (FloatSquareBenchmark). same shape as
the dotproduct float, no surprises:
skylake:
```
Benchmark (size) Mode Cnt
rmuir commented on code in PR #12311:
URL: https://github.com/apache/lucene/pull/12311#discussion_r1201270494
##
gradle/testing/defaults-tests.gradle:
##
@@ -119,11 +119,16 @@ allprojects {
if (rootProject.runtimeJavaVersion < JavaVersion.VERSION_16) {
jvmArgs '-
rmuir commented on code in PR #12311:
URL: https://github.com/apache/lucene/pull/12311#discussion_r1201265341
##
gradle/testing/defaults-tests.gradle:
##
@@ -119,11 +119,16 @@ allprojects {
if (rootProject.runtimeJavaVersion < JavaVersion.VERSION_16) {
jvmArgs '-
rmuir commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1558073763
I pushed a BinaryCosine benchmark as well, also similar stuff, just a more
complex formula:
mac m1:
```
Benchmark(size) Mode Cnt Score Err
rmuir commented on code in PR #12311:
URL: https://github.com/apache/lucene/pull/12311#discussion_r1201114747
##
lucene/core/src/java20/org/apache/lucene/util/VectorUtilPanamaProvider.java:
##
@@ -0,0 +1,157 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one o
benwtrent commented on PR #12314:
URL: https://github.com/apache/lucene/pull/12314#issuecomment-1558005668
Thank you for the simplification! I will take another look in about 2 weeks.
I am on a cross country camping trip :).
There is a ton of good work in this PR. Excited to get this
rmuir commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557985868
and here's my m1 for this one:
```
Benchmark(size) Mode CntScore Error
Units
BinarySquareBenchmark.squareDistanceNew1024 thrpt5
rmuir commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557981841
Working my way thru all the vector similarity functions, I pushed initial
stab at the binary euclidean distance to https://github.com/rmuir/vectorbench
run it with `java -jar target/vect
uschindler commented on code in PR #12311:
URL: https://github.com/apache/lucene/pull/12311#discussion_r1201067808
##
lucene/core/src/java20/org/apache/lucene/util/VectorUtilPanamaProvider.java:
##
@@ -0,0 +1,157 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
uschindler commented on code in PR #12311:
URL: https://github.com/apache/lucene/pull/12311#discussion_r1201067163
##
lucene/core/src/java20/org/apache/lucene/util/VectorUtilPanamaProvider.java:
##
@@ -0,0 +1,157 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
uschindler commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557873260
> > > btw, i think there's a real bug in the SPECIES_PREFERRED stuff that
makes testing such degenerate cases _really difficult_. You should be able to
just pass `-XX:MaxVectorSize=8`
rmuir commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557870684
thanks for benchmarking! I will merge it into the branch
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL ab
ChrisHegarty commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557853835
> > btw, i think there's a real bug in the SPECIES_PREFERRED stuff that
makes testing such degenerate cases _really difficult_. You should be able to
just pass `-XX:MaxVectorSize=8`
ChrisHegarty commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557840103
@rmuir Testing with the latest vector bench, commit 8f25834, I see:
Linux - AVX 512
```
Benchmark (size) Mode Cnt Score
Error
jainankitk commented on issue #12317:
URL: https://github.com/apache/lucene/issues/12317#issuecomment-1557736895
@gsmiller - Thank you for reviewing and providing your comments
> it looks like you're primarily looking at an indexing-related performance
issue and concerned with the mem
rmuir commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557693789
@ChrisHegarty i pushed a commit, stealing some of your ideas there, to
generalized the 256-bit algo to also (in theory) work with avx512.
It causes no regression on my avx-256, I am
uschindler commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557691360
> btw, i think there's a real bug in the SPECIES_PREFERRED stuff that makes
testing such degenerate cases _really difficult_. You should be able to just
pass `-XX:MaxVectorSize=8` or
rmuir commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557660515
i think another approach would be to generalize the 256 algorithm (no
splitting into parts) to also work with 512? No need to have a separate `if`
for 512 when its the same algo i think?
rmuir commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557642297
@ChrisHegarty See what I mean around correctness? This is what will happen
for machine with only 64-bit vectors.
```
jshell> jdk.incubator.vector.VectorShape.forBitSize(32)
| Exce
rmuir commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557639326
btw, i think there's a real bug in the SPECIES_PREFERRED stuff that makes
testing such degenerate cases *really difficult*. You should be able to just
pass `-XX:MaxVectorSize=8` or `-XX:Us
rmuir commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557635741
if you want to fix the correctness issue for the no-vectors-supported case,
just add a guard that supported vector size is at least 128 bits. It must be at
least 128 so that you can divide
rmuir commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557631563
Seems to take quite a hit on my 256. And I suspect if you tried to make a
"512 version" of the existing code it might be much better too?
ByteVector.SPECIES_128 -> ShortVector.SPECIES_256
rmuir commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557616126
interesting, let me try it on my 256. if it doesn't hurt the performance
(much), then let's go with it. i would prefer to have a "generalized" version
like this.
--
This is an automated
ChrisHegarty commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557577587
@rmuir for the byte[] case, it seems to me that we want to size things so as
to optimise for the ShortVector preferred species, right? which is what you
seem to have done for a numb
alessandrobenedetti commented on PR #12314:
URL: https://github.com/apache/lucene/pull/12314#issuecomment-1557458202
Adding the information that a graph was built multi-valued, in the segment
allows a check at query time to differentiate the single values vs multi-valued
approach.
Not a
alessandrobenedetti commented on PR #12314:
URL: https://github.com/apache/lucene/pull/12314#issuecomment-1557354463
I pushed a commit with the query time simplification (only MAX strategy is
supported).
The diff is simpler but I am not convinced it's better.
I also remembered a bit mo
msokolov commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557343456
In luceneutil there is a python script called `vector-test.py` that you can
use to run performance tests for vector search. It was a little messed up; I
just pushed a change to make it
rmuir commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557241117
i sped up the binary dotproduct some for the 128-bit case by doing similar
thing, using ByteVector.SPECIES_64.
--
This is an automated message from the Apache Git Service.
To respond to
uschindler commented on code in PR #12293:
URL: https://github.com/apache/lucene/pull/12293#discussion_r1200514470
##
gradle/ge.gradle:
##
@@ -0,0 +1,26 @@
+def isGithubActions = System.getenv('GITHUB_ACTIONS') != null
+def isJenkins = System.getenv('JENKINS_URL') != null
+def i
uschindler commented on code in PR #12293:
URL: https://github.com/apache/lucene/pull/12293#discussion_r1200504015
##
gradle/ge.gradle:
##
@@ -0,0 +1,26 @@
+def isGithubActions = System.getenv('GITHUB_ACTIONS') != null
+def isJenkins = System.getenv('JENKINS_URL') != null
+def i
rmuir commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557161657
yeah dunno, i have to fix my hsdis (probably wrestle the openjdk makefile
and recompile it) to really see what is happening. such an annoyance!
for now since it gives 4x speedup on i
clayburn commented on code in PR #12293:
URL: https://github.com/apache/lucene/pull/12293#discussion_r1200469331
##
gradle/ge.gradle:
##
@@ -0,0 +1,26 @@
+def isGithubActions = System.getenv('GITHUB_ACTIONS') != null
+def isJenkins = System.getenv('JENKINS_URL') != null
+def isC
uschindler commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557153092
Hi,
> With 256 bit vectors it is fast using ByteVector.SPECIES_64,
ShortVector.SPECIES_128, and IntVector.SPECIES_256 But for ARM which only has
128-bit vectors, the generic co
rmuir commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557149306
> I don't have perf numbers any more - no idea whether this is better than
what you have already - probably not, but it might be worth trying castShape?
I'm using convertShape which
uschindler commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557146310
@msokolov Can you assist us with how to run Mike's luceneutil bench to get
best insights to vector code? The default query benchmark has no support for
vectors.
--
This is an autom
uschindler commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557144955
Hi,
> I didn't get an anywhere with Luceneutil yet! :-( (I haven't been able to
run it successfully, getting OOM errors )
Did you get the OOMs only with our vector code?
clayburn commented on code in PR #12293:
URL: https://github.com/apache/lucene/pull/12293#discussion_r1200451756
##
gradle/ge.gradle:
##
@@ -0,0 +1,26 @@
+def isGithubActions = System.getenv('GITHUB_ACTIONS') != null
+def isJenkins = System.getenv('JENKINS_URL') != null
+def isC
msokolov commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557119416
When I had tried this before I did something like:
```
+ShortVector acc = ShortVector.zero(SHORT_SPECIES);
+int l = 0;
+for (; l < BYTE_SPECIES.loopBound(len);
uschindler commented on code in PR #12293:
URL: https://github.com/apache/lucene/pull/12293#discussion_r1200427420
##
gradle/ge.gradle:
##
@@ -0,0 +1,26 @@
+def isGithubActions = System.getenv('GITHUB_ACTIONS') != null
+def isJenkins = System.getenv('JENKINS_URL') != null
+def i
hydrogen666 commented on PR #11998:
URL: https://github.com/apache/lucene/pull/11998#issuecomment-1557118792
> In previous version, `StoredFieldsReader` is cached in `ThreadLocal`, but
now we need to `clone` `StoredFieldsReader` every time if we need to visit
store fields. Will this PR caus
clayburn closed pull request #12266: Capture build scans on ge.apache.org to
benefit from deep build insights
URL: https://github.com/apache/lucene/pull/12266
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above
rafalh commented on issue #10309:
URL: https://github.com/apache/lucene/issues/10309#issuecomment-1556991229
I recently encountered this issue when migrating from Solr 7.x to 9.x. For
me it caused wrong scores for a query that consists of exact match with a boost
ORed with a fuzzy match, e.
hydrogen666 commented on PR #11998:
URL: https://github.com/apache/lucene/pull/11998#issuecomment-1556962751
In previous version `StoredFieldsReader` is cached in `ThreadLocal`, but now
we need to `clone` `StoredFieldsReader` every time if we need to visit store
fields. Will this PR cause a
alessandrobenedetti commented on PR #12314:
URL: https://github.com/apache/lucene/pull/12314#issuecomment-1556882171
Hi
> @alessandrobenedetti thank you for kick starting this!
>
> You are absolutely correct, this is a large, but pivotal and necessary
change for vector search i
bruno-roustant commented on issue #12309:
URL: https://github.com/apache/lucene/issues/12309#issuecomment-1556776510
In the same work, or in a separate work, we could create the extension of
the HNSW implementation in the codecs package to provide it to users, so they
don't have to have the
gf2121 opened a new pull request, #12324:
URL: https://github.com/apache/lucene/pull/12324
Today `Sparse#AdvanceExactWithinBlock` always need to read next doc and seek
back if a doc not exists. This could do harm to performance in dense hit
queries.
For example, a field exists in do
49 matches
Mail list logo