gf2121 opened a new pull request, #12324:
URL: https://github.com/apache/lucene/pull/12324
Today `Sparse#AdvanceExactWithinBlock` always need to read next doc and seek
back if a doc not exists. This could do harm to performance in dense hit
queries.
For example, a field exists in do
bruno-roustant commented on issue #12309:
URL: https://github.com/apache/lucene/issues/12309#issuecomment-1556776510
In the same work, or in a separate work, we could create the extension of
the HNSW implementation in the codecs package to provide it to users, so they
don't have to have the
alessandrobenedetti commented on PR #12314:
URL: https://github.com/apache/lucene/pull/12314#issuecomment-1556882171
Hi
> @alessandrobenedetti thank you for kick starting this!
>
> You are absolutely correct, this is a large, but pivotal and necessary
change for vector search i
hydrogen666 commented on PR #11998:
URL: https://github.com/apache/lucene/pull/11998#issuecomment-1556962751
In previous version `StoredFieldsReader` is cached in `ThreadLocal`, but now
we need to `clone` `StoredFieldsReader` every time if we need to visit store
fields. Will this PR cause a
rafalh commented on issue #10309:
URL: https://github.com/apache/lucene/issues/10309#issuecomment-1556991229
I recently encountered this issue when migrating from Solr 7.x to 9.x. For
me it caused wrong scores for a query that consists of exact match with a boost
ORed with a fuzzy match, e.
clayburn closed pull request #12266: Capture build scans on ge.apache.org to
benefit from deep build insights
URL: https://github.com/apache/lucene/pull/12266
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above
hydrogen666 commented on PR #11998:
URL: https://github.com/apache/lucene/pull/11998#issuecomment-1557118792
> In previous version, `StoredFieldsReader` is cached in `ThreadLocal`, but
now we need to `clone` `StoredFieldsReader` every time if we need to visit
store fields. Will this PR caus
uschindler commented on code in PR #12293:
URL: https://github.com/apache/lucene/pull/12293#discussion_r1200427420
##
gradle/ge.gradle:
##
@@ -0,0 +1,26 @@
+def isGithubActions = System.getenv('GITHUB_ACTIONS') != null
+def isJenkins = System.getenv('JENKINS_URL') != null
+def i
msokolov commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557119416
When I had tried this before I did something like:
```
+ShortVector acc = ShortVector.zero(SHORT_SPECIES);
+int l = 0;
+for (; l < BYTE_SPECIES.loopBound(len);
clayburn commented on code in PR #12293:
URL: https://github.com/apache/lucene/pull/12293#discussion_r1200451756
##
gradle/ge.gradle:
##
@@ -0,0 +1,26 @@
+def isGithubActions = System.getenv('GITHUB_ACTIONS') != null
+def isJenkins = System.getenv('JENKINS_URL') != null
+def isC
uschindler commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557144955
Hi,
> I didn't get an anywhere with Luceneutil yet! :-( (I haven't been able to
run it successfully, getting OOM errors )
Did you get the OOMs only with our vector code?
uschindler commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557146310
@msokolov Can you assist us with how to run Mike's luceneutil bench to get
best insights to vector code? The default query benchmark has no support for
vectors.
--
This is an autom
rmuir commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557149306
> I don't have perf numbers any more - no idea whether this is better than
what you have already - probably not, but it might be worth trying castShape?
I'm using convertShape which
uschindler commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557153092
Hi,
> With 256 bit vectors it is fast using ByteVector.SPECIES_64,
ShortVector.SPECIES_128, and IntVector.SPECIES_256 But for ARM which only has
128-bit vectors, the generic co
clayburn commented on code in PR #12293:
URL: https://github.com/apache/lucene/pull/12293#discussion_r1200469331
##
gradle/ge.gradle:
##
@@ -0,0 +1,26 @@
+def isGithubActions = System.getenv('GITHUB_ACTIONS') != null
+def isJenkins = System.getenv('JENKINS_URL') != null
+def isC
rmuir commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557161657
yeah dunno, i have to fix my hsdis (probably wrestle the openjdk makefile
and recompile it) to really see what is happening. such an annoyance!
for now since it gives 4x speedup on i
uschindler commented on code in PR #12293:
URL: https://github.com/apache/lucene/pull/12293#discussion_r1200504015
##
gradle/ge.gradle:
##
@@ -0,0 +1,26 @@
+def isGithubActions = System.getenv('GITHUB_ACTIONS') != null
+def isJenkins = System.getenv('JENKINS_URL') != null
+def i
uschindler commented on code in PR #12293:
URL: https://github.com/apache/lucene/pull/12293#discussion_r1200514470
##
gradle/ge.gradle:
##
@@ -0,0 +1,26 @@
+def isGithubActions = System.getenv('GITHUB_ACTIONS') != null
+def isJenkins = System.getenv('JENKINS_URL') != null
+def i
rmuir commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557241117
i sped up the binary dotproduct some for the 128-bit case by doing similar
thing, using ByteVector.SPECIES_64.
--
This is an automated message from the Apache Git Service.
To respond to
msokolov commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557343456
In luceneutil there is a python script called `vector-test.py` that you can
use to run performance tests for vector search. It was a little messed up; I
just pushed a change to make it
alessandrobenedetti commented on PR #12314:
URL: https://github.com/apache/lucene/pull/12314#issuecomment-1557354463
I pushed a commit with the query time simplification (only MAX strategy is
supported).
The diff is simpler but I am not convinced it's better.
I also remembered a bit mo
alessandrobenedetti commented on PR #12314:
URL: https://github.com/apache/lucene/pull/12314#issuecomment-1557458202
Adding the information that a graph was built multi-valued, in the segment
allows a check at query time to differentiate the single values vs multi-valued
approach.
Not a
ChrisHegarty commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557577587
@rmuir for the byte[] case, it seems to me that we want to size things so as
to optimise for the ShortVector preferred species, right? which is what you
seem to have done for a numb
rmuir commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557616126
interesting, let me try it on my 256. if it doesn't hurt the performance
(much), then let's go with it. i would prefer to have a "generalized" version
like this.
--
This is an automated
rmuir commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557631563
Seems to take quite a hit on my 256. And I suspect if you tried to make a
"512 version" of the existing code it might be much better too?
ByteVector.SPECIES_128 -> ShortVector.SPECIES_256
rmuir commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557635741
if you want to fix the correctness issue for the no-vectors-supported case,
just add a guard that supported vector size is at least 128 bits. It must be at
least 128 so that you can divide
rmuir commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557639326
btw, i think there's a real bug in the SPECIES_PREFERRED stuff that makes
testing such degenerate cases *really difficult*. You should be able to just
pass `-XX:MaxVectorSize=8` or `-XX:Us
rmuir commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557642297
@ChrisHegarty See what I mean around correctness? This is what will happen
for machine with only 64-bit vectors.
```
jshell> jdk.incubator.vector.VectorShape.forBitSize(32)
| Exce
rmuir commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557660515
i think another approach would be to generalize the 256 algorithm (no
splitting into parts) to also work with 512? No need to have a separate `if`
for 512 when its the same algo i think?
uschindler commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557691360
> btw, i think there's a real bug in the SPECIES_PREFERRED stuff that makes
testing such degenerate cases _really difficult_. You should be able to just
pass `-XX:MaxVectorSize=8` or
rmuir commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557693789
@ChrisHegarty i pushed a commit, stealing some of your ideas there, to
generalized the 256-bit algo to also (in theory) work with avx512.
It causes no regression on my avx-256, I am
jainankitk commented on issue #12317:
URL: https://github.com/apache/lucene/issues/12317#issuecomment-1557736895
@gsmiller - Thank you for reviewing and providing your comments
> it looks like you're primarily looking at an indexing-related performance
issue and concerned with the mem
ChrisHegarty commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557840103
@rmuir Testing with the latest vector bench, commit 8f25834, I see:
Linux - AVX 512
```
Benchmark (size) Mode Cnt Score
Error
ChrisHegarty commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557853835
> > btw, i think there's a real bug in the SPECIES_PREFERRED stuff that
makes testing such degenerate cases _really difficult_. You should be able to
just pass `-XX:MaxVectorSize=8`
rmuir commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557870684
thanks for benchmarking! I will merge it into the branch
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL ab
uschindler commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557873260
> > > btw, i think there's a real bug in the SPECIES_PREFERRED stuff that
makes testing such degenerate cases _really difficult_. You should be able to
just pass `-XX:MaxVectorSize=8`
uschindler commented on code in PR #12311:
URL: https://github.com/apache/lucene/pull/12311#discussion_r1201067163
##
lucene/core/src/java20/org/apache/lucene/util/VectorUtilPanamaProvider.java:
##
@@ -0,0 +1,157 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
uschindler commented on code in PR #12311:
URL: https://github.com/apache/lucene/pull/12311#discussion_r1201067808
##
lucene/core/src/java20/org/apache/lucene/util/VectorUtilPanamaProvider.java:
##
@@ -0,0 +1,157 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
rmuir commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557981841
Working my way thru all the vector similarity functions, I pushed initial
stab at the binary euclidean distance to https://github.com/rmuir/vectorbench
run it with `java -jar target/vect
rmuir commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557985868
and here's my m1 for this one:
```
Benchmark(size) Mode CntScore Error
Units
BinarySquareBenchmark.squareDistanceNew1024 thrpt5
benwtrent commented on PR #12314:
URL: https://github.com/apache/lucene/pull/12314#issuecomment-1558005668
Thank you for the simplification! I will take another look in about 2 weeks.
I am on a cross country camping trip :).
There is a ton of good work in this PR. Excited to get this
rmuir commented on code in PR #12311:
URL: https://github.com/apache/lucene/pull/12311#discussion_r1201114747
##
lucene/core/src/java20/org/apache/lucene/util/VectorUtilPanamaProvider.java:
##
@@ -0,0 +1,157 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one o
rmuir commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1558073763
I pushed a BinaryCosine benchmark as well, also similar stuff, just a more
complex formula:
mac m1:
```
Benchmark(size) Mode Cnt Score Err
rmuir commented on code in PR #12311:
URL: https://github.com/apache/lucene/pull/12311#discussion_r1201265341
##
gradle/testing/defaults-tests.gradle:
##
@@ -119,11 +119,16 @@ allprojects {
if (rootProject.runtimeJavaVersion < JavaVersion.VERSION_16) {
jvmArgs '-
rmuir commented on code in PR #12311:
URL: https://github.com/apache/lucene/pull/12311#discussion_r1201270494
##
gradle/testing/defaults-tests.gradle:
##
@@ -119,11 +119,16 @@ allprojects {
if (rootProject.runtimeJavaVersion < JavaVersion.VERSION_16) {
jvmArgs '-
rmuir commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1558205450
I pushed a float euclidean benchmark (FloatSquareBenchmark). same shape as
the dotproduct float, no surprises:
skylake:
```
Benchmark (size) Mode Cnt
rmuir commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1558282897
ok last function done (FloatCosineBenchmark). again no surprises here:
skylake:
```
Benchmark (size) Mode CntScoreError Units
FloatCosineBe
rmuir commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1558372829
Here's a summary of where the perf sits for these various functions on my
machines.
It only takes 5 minutes to run a pass just for vector size of 1024
dimensions only to get an idea
dependabot[bot] opened a new pull request, #150:
URL: https://github.com/apache/lucene-jira-archive/pull/150
Bumps [requests](https://github.com/psf/requests) from 2.28.0 to 2.31.0.
Release notes
Sourced from https://github.com/psf/requests/releases";>requests's
releases.
v2
49 matches
Mail list logo