nitirajrathore commented on issue #12627:
URL: https://github.com/apache/lucene/issues/12627#issuecomment-1772314503
Thanks @msokolov : These are really good suggestions. I will try to
incorporate these ideas in solutions. I think in the end there can be multiple
ways to allow more connecti
mikemccand commented on issue #12702:
URL: https://github.com/apache/lucene/issues/12702#issuecomment-1772457667
> The floor data is guaranteed to be stored within single arc (never be
prefix shared) in FST because fp is encoded before it.
But won't the leading bytes of `fp` be shared
ChrisHegarty opened a new pull request, #12703:
URL: https://github.com/apache/lucene/pull/12703
[ This PR is draft - not ready to me merged. It is intended to help
facilitate a discussion ]
This PR enhances the vector similarity functions so that they can access the
underlying memor
ChrisHegarty commented on PR #12703:
URL: https://github.com/apache/lucene/pull/12703#issuecomment-1772530717
Some benchmark results.
Mac M2, 128 bit
```
INFO: Java vector incubator API enabled; uses preferredBitSize=128
...
Benchmark (si
ChrisHegarty commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1772535368
Thanks @rmuir @gf2121 I need to spend a bit more evaluating this. But it
looks like no action is needed here?
--
This is an automated message from the Apache Git Service.
To res
mikemccand merged PR #12698:
URL: https://github.com/apache/lucene/pull/12698
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.
mikemccand closed issue #12697: ArrayIndexOutOfBoundsException when writing the
FSTStore-backed FST with different DataOutput for meta
URL: https://github.com/apache/lucene/issues/12697
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to Gi
bruno-roustant commented on code in PR #12633:
URL: https://github.com/apache/lucene/pull/12633#discussion_r1366758770
##
lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java:
##
@@ -20,76 +20,161 @@
import org.apache.lucene.util.packed.PackedInts;
import org.apache.l
mikemccand commented on PR #12698:
URL: https://github.com/apache/lucene/pull/12698#issuecomment-1772566523
Thanks @dungba88 -- I merged to `main` and `branch_9x`!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
UR
bruno-roustant commented on PR #12688:
URL: https://github.com/apache/lucene/pull/12688#issuecomment-1772589968
I'll also try to review!
On the bit packing subject, I have some handy generic code (not in Lucene
yet) to write and read variable size bits. Tell me if you are interested.
--
benwtrent commented on PR #12660:
URL: https://github.com/apache/lucene/pull/12660#issuecomment-1772603178
This is awesome. I am so happy it's a clean change without tons of
complexity and we still get 4x speed up with additional threads.
I will give it a review this weekend or early
mikemccand commented on code in PR #12633:
URL: https://github.com/apache/lucene/pull/12633#discussion_r1366900931
##
lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java:
##
@@ -135,32 +123,28 @@ public class FSTCompiler {
* Instantiates an FST/FSA builder with
mikemccand commented on code in PR #12633:
URL: https://github.com/apache/lucene/pull/12633#discussion_r1366903022
##
lucene/core/src/java/org/apache/lucene/util/packed/AbstractPagedMutable.java:
##
@@ -110,8 +110,10 @@ protected long baseRamBytesUsed() {
public long ramBytes
mikemccand commented on code in PR #12633:
URL: https://github.com/apache/lucene/pull/12633#discussion_r1366910269
##
lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java:
##
@@ -99,20 +184,18 @@ private long hash(FSTCompiler.UnCompiledNode node) {
h += 17;
rmuir commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1772656211
@ChrisHegarty there are plenty of actions we could take... but I implemented
this specific same optimization in question safely in #12681
See https://en.wikipedia.org/wiki/Advanced_
mikemccand opened a new issue, #12704:
URL: https://github.com/apache/lucene/issues/12704
### Description
Spinoff from [this cool
comment](https://github.com/apache/lucene/pull/12633#discussion_r1366847986),
thanks to hashing guru @bruno-roustant:
```
Instead, we should mul
mikemccand commented on code in PR #12633:
URL: https://github.com/apache/lucene/pull/12633#discussion_r1366913164
##
lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java:
##
@@ -99,20 +184,18 @@ private long hash(FSTCompiler.UnCompiledNode node) {
h += 17;
rmuir commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1772661255
to have any decent performance, we really need information on the CPU in
question and its vector capabilities. And the idea you can write "one loop"
that "runs anywhere" is an obvious pipe
rmuir commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1772669239
also i think JMH is bad news when it comes to downclocking. It does not show
the true performance impact of this. It slows down other things on the machine
as well: the user might have oth
mikemccand commented on PR #12633:
URL: https://github.com/apache/lucene/pull/12633#issuecomment-1772672994
I'll also confirm `Test2BFST` still passes ... soon this test will no longer
require a 35 GB heap to run!
--
This is an automated message from the Apache Git Service.
To respond to
uschindler commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1772673981
> to have any decent performance, we really need information on the CPU in
question and its vector capabilities. And the idea you can write "one loop"
that "runs anywhere" is an obvio
uschindler commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1772679957
> Unfortunately this approach is slightly suboptimal for your Rocket Lake
which doesn't suffer from downclocking, but it is a disaster elsewhere, so we
have to play it safe.
We
rmuir commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1772695285
Vector API should also fix its bugs. It is totally senseless to have
`IntVector.SPECIES_PREFERRED` and `FloatVector.SPECIES_PREFERRED` and then
always set them to '512' on every avx-512 ma
rmuir commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1772702122
I would really just fix the api: instead of `IntVector.SPECIES_PREFERRED`
constant which is meaningless, it should be a method taking
`VectorOperation...` about how you plan to use it. it
jbellis commented on issue #12615:
URL: https://github.com/apache/lucene/issues/12615#issuecomment-1772704049
Responding top to bottom,
> I wonder how much the speed difference is due to (1) Vectors being out of
memory (and if they used PQ for diskann, if they did, we should test PQ w
rmuir commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1772706786
such a method would solve 95% of my problems, if it would throw
UnsupportedOperationException or return `null` if the hardware/hotspot doesnt
support all the requested VectorOperators.
-
jbellis commented on issue #12615:
URL: https://github.com/apache/lucene/issues/12615#issuecomment-1772711571
> DiskANN is known to be slower at indexing than HNSW
I don't remember the numbers here, maybe 10% slower? It wasn't material
enough to make me worry about it. (This is wit
jbellis commented on issue #12615:
URL: https://github.com/apache/lucene/issues/12615#issuecomment-1772722737
> It is possible that the candidate postings (gathered via HNSW) don't
contain ANY filtered docs. This would require gathering more candidate postings.
This was a big problem
jbellis commented on issue #12615:
URL: https://github.com/apache/lucene/issues/12615#issuecomment-1772724758
> Or perhaps we "just" make a Lucene Codec component (KnnVectorsFormat) that
wraps jvector? (https://github.com/jbellis/jvector)
I'm happy to support anyone who wants to try t
easyice commented on PR #12658:
URL: https://github.com/apache/lucene/pull/12658#issuecomment-1772756782
I think we can only use this optimization without deleted docs for merges,
because we can't use the cardinality of `liveDocs` as docCount, the `liveDocs`
is set to 1 when initialized.
easyice commented on code in PR #12658:
URL: https://github.com/apache/lucene/pull/12658#discussion_r1366998517
##
lucene/core/src/java/org/apache/lucene/util/bkd/BKDWriter.java:
##
@@ -519,9 +526,8 @@ private Runnable writeFieldNDims(
// compute the min/max for this slice
rmuir commented on PR #12703:
URL: https://github.com/apache/lucene/pull/12703#issuecomment-1772787526
Thanks for investigating this! Can we just fix vector code to take
MemorySegment and wrap array code?
I don't think we should add yet another factor to multiply the number of
vector
rmuir commented on PR #12703:
URL: https://github.com/apache/lucene/pull/12703#issuecomment-1772792213
as far as performance in practice, what kind of alignment is necessary such
that it is reasonable for mmap'd files? Please, let it not be 64 bytes
alignment for avx-512, that's too wastefu
bruno-roustant commented on issue #12704:
URL: https://github.com/apache/lucene/issues/12704#issuecomment-1772810122
@dweiss will probably say more than me about the awesome BitMixer#PHI_C64
constant!
--
This is an automated message from the Apache Git Service.
To respond to the message,
mikemccand commented on PR #12653:
URL: https://github.com/apache/lucene/pull/12653#issuecomment-1772824555
Thanks @shubhamvishu -- looks great! I plan to merge later today.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and
mikemccand commented on code in PR #12653:
URL: https://github.com/apache/lucene/pull/12653#discussion_r1367036009
##
lucene/core/src/java/org/apache/lucene/codecs/MultiLevelSkipListWriter.java:
##
@@ -63,24 +63,23 @@ public abstract class MultiLevelSkipListWriter {
/** for e
shubhamvishu commented on code in PR #12679:
URL: https://github.com/apache/lucene/pull/12679#discussion_r1367106152
##
lucene/core/src/java/org/apache/lucene/search/AbstractRnnVectorQuery.java:
##
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
shubhamvishu commented on PR #12682:
URL: https://github.com/apache/lucene/pull/12682#issuecomment-1772913419
Thanks for the approval @jpountz !
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to th
rmuir commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1772915097
> > Unfortunately this approach is slightly suboptimal for your Rocket Lake
which doesn't suffer from downclocking, but it is a disaster elsewhere, so we
have to play it safe.
>
> W
ljak commented on PR #2179:
URL: https://github.com/apache/lucene-solr/pull/2179#issuecomment-1772924915
Hi,
I know it's an old thread but I have a question.
As far as I can tell (after searching), the `maxShardsPerNode` function
wasn't re-implemented right (in the new autoscal
uschindler commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1772952661
Hi,
> The jvm already has these. For example a user can set max vector width and
avx instructiom level already. I assume that avx 512 users who are running on
downclock-suscept
ChrisHegarty commented on PR #12703:
URL: https://github.com/apache/lucene/pull/12703#issuecomment-1772963134
@rmuir If I understand your comment correctly. I unaligned the vector data
in the mmap file, in the benchmark. The results are similar enough to the
aligned, maybe a little less wh
ChrisHegarty commented on PR #12703:
URL: https://github.com/apache/lucene/pull/12703#issuecomment-1772974346
> Thanks for investigating this! Can we just fix vector code to take
MemorySegment and wrap array code?
Yes, that is a good idea. I'll do it and see how poorly is performs. I
ChrisHegarty commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1772982177
Just dropping an ACK here, for now. I do get the issues, and I agree that
there could be better ways to frame things at the vector API level.
--
This is an automated message from
mikemccand commented on PR #12633:
URL: https://github.com/apache/lucene/pull/12633#issuecomment-1772988443
`Test2BFST` passed!
```
The slowest tests (exceeding 500 ms) during this run:
mikemccand merged PR #12633:
URL: https://github.com/apache/lucene/pull/12633
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.
mikemccand commented on issue #12542:
URL: https://github.com/apache/lucene/issues/12542#issuecomment-1772993863
I've merged the change into `main`! I'll let it bake for some time (week or
two?) and if all looks good, backport to 9.x.
--
This is an automated message from the Apache Git S
uschindler commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1773038045
> Just dropping an ACK here, for now. I do get the issues, and I agree that
there could be better ways to frame things at the vector API level.
Let's write a proposal together i
Tony-X commented on PR #12688:
URL: https://github.com/apache/lucene/pull/12688#issuecomment-1773113866
Thanks @bruno-roustant ! If you're okay to share it feel free to share it
here.
I'm in the process of baking my own specific implementation (which
internally uses a single long as
ChrisHegarty commented on PR #12703:
URL: https://github.com/apache/lucene/pull/12703#issuecomment-1773365569
Well... as simple wrapping of float[] into MemorySegment is not going to
work out, the Vector API does not like it due to alignment constraints (which
seems overly pedantic since it
dsmiley commented on PR #12293:
URL: https://github.com/apache/lucene/pull/12293#issuecomment-1773459815
I'm eager to see the kind of build insights Gradle Enterprise offers us. If
there are no further concerns, I'll merge Tuesday.
--
This is an automated message from the Apache Git Serv
rmuir commented on PR #12703:
URL: https://github.com/apache/lucene/pull/12703#issuecomment-1773612369
> Well... as simple wrapping of float[] into MemorySegment is not going to
work out, the Vector API does not like it due to alignment constraints (which
seems overly pedantic since it can
dungba88 commented on PR #12690:
URL: https://github.com/apache/lucene/pull/12690#issuecomment-1773630051
As the other PR has been merged, I have rebased and resolved the conflict
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHu
53 matches
Mail list logo