kaivalnp commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2771809487
All dependent Faiss PRs are merged:
1. https://github.com/facebookresearch/faiss/pull/4158: Support
pre-filtering on a Java `long[]` (underlying of `FixedBitSet`) using
`IDSelectorBi
github-actions[bot] commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2752825072
This PR has not had activity in the past 2 weeks, labeling it as stale. If
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you
for your contributi
kaivalnp commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2715605453
Thanks for the review @navneet1v!
> lucene util branch
You can find some (very hacky) changes
[here](https://github.com/kaivalnp/luceneutil/tree/faiss). Broad steps to run
kaivalnp commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1982961727
##
lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/faiss/FaissKnnVectorsWriter.java:
##
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation
navneet1v commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2714965745
> @navneet1v I wonder if either of you were able to replicate benchmarks?
@kaivalnp can you share your lucene util branch so that I can replicate your
results.
--
This is an
navneet1v commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1989679548
##
lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/faiss/LibFaissC.java:
##
@@ -0,0 +1,488 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
kaivalnp commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2714811428
I've added a GH workflow (see [sample
output](https://github.com/apache/lucene/actions/runs/13791742930/job/38573182600?pr=14178))
that builds and adds the C_API of Faiss before running
kaivalnp commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2710218340
Thanks @benwtrent!
> While I think the performance numbers are cool, they indicate that this
doesn't actually buy us that much
The speedup we see above is just a pure HNSW
navneet1v commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1980484722
##
lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/faiss/FaissKnnVectorsWriter.java:
##
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation
benwtrent commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2703746035
> @benwtrent @navneet1v I wonder if either of you were able to replicate
benchmarks?
I didn't want to leave you hanging @kaivalnp, especially after you have
obviously put a ton
kaivalnp commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1982960581
##
lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/faiss/LibFaissC.java:
##
@@ -0,0 +1,488 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
kaivalnp commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1982959099
##
lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/LibFaissC.java:
##
@@ -0,0 +1,457 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) unde
navneet1v commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2699321614
> @benwtrent @navneet1v I wonder if either of you were able to replicate
benchmarks? (FYI I also opened
[facebookresearch/faiss#4186](https://github.com/facebookresearch/faiss/pull/418
kaivalnp commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2697323024
@benwtrent @navneet1v I wonder if either of you were able to replicate
benchmarks?
(FYI I also opened https://github.com/facebookresearch/faiss/pull/4186 to
start publishing the C_AP
kaivalnp commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2697320627
Summary of latest changes:
1. Added tests! These will only run if `libfaiss_c.so` (along with all
dependencies) is present during runtime (in `$LD_LIBRARY_PATH` or
`-Djava.library.pa
github-actions[bot] commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2691766624
This PR has not had activity in the past 2 weeks, labeling it as stale. If
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you
for your contributi
kaivalnp commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1955603323
##
lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/FaissKnnVectorsReader.java:
##
@@ -0,0 +1,182 @@
+/*
+ * Licensed to the Apache Software Foundatio
kaivalnp commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1955600494
##
lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/LibFaissC.java:
##
@@ -0,0 +1,457 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) unde
kaivalnp commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1955595482
##
lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/LibFaissC.java:
##
@@ -0,0 +1,457 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) unde
navneet1v commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1955523940
##
lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/LibFaissC.java:
##
@@ -0,0 +1,457 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) und
navneet1v commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1955514351
##
lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/FaissKnnVectorsFormat.java:
##
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundatio
navneet1v commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1955514351
##
lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/FaissKnnVectorsFormat.java:
##
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundatio
navneet1v commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1952063404
##
lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/LibFaissC.java:
##
@@ -0,0 +1,457 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) und
kaivalnp commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1952061362
##
lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/LibFaissC.java:
##
@@ -0,0 +1,457 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) unde
navneet1v commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1952041592
##
lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/LibFaissC.java:
##
@@ -0,0 +1,457 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) und
navneet1v commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1952038874
##
lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/LibFaissC.java:
##
@@ -0,0 +1,457 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) und
kaivalnp commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1948092653
##
lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/FaissKnnVectorsWriter.java:
##
@@ -0,0 +1,204 @@
+/*
+ * Licensed to the Apache Software Foundatio
kaivalnp commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2635120100
Build failure seems unrelated, created #14196
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to g
kaivalnp commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2635103176
### Some more points / thoughts
- Built for Faiss `v1.10.0` (version is validated at runtime)
- Can be compiled with lower versions of Java, and run with 22+ (using an
MR-JAR)
-
kaivalnp commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1941904975
##
lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/LibFaissC.java:
##
@@ -0,0 +1,268 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) unde
benwtrent commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1941120908
##
lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/LibFaissC.java:
##
@@ -0,0 +1,268 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) und
kaivalnp commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1940738982
##
lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/LibFaissC.java:
##
@@ -0,0 +1,268 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) unde
kaivalnp commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1935407529
##
lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/LibFaissC.java:
##
@@ -0,0 +1,268 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) unde
kaivalnp commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2628899940
I found one way to reduce index-time RAM usage -- turns out the
[`FlatVectorsWriter`](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/hnsw/FlatVe
kaivalnp commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1935570579
##
lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/LibFaissC.java:
##
@@ -0,0 +1,268 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) unde
kaivalnp commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1935407529
##
lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/LibFaissC.java:
##
@@ -0,0 +1,268 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) unde
kaivalnp commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1935304089
##
lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/LibFaissC.java:
##
@@ -0,0 +1,268 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) unde
kaivalnp commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1935303469
##
lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/FaissKnnVectorsWriter.java:
##
@@ -0,0 +1,204 @@
+/*
+ * Licensed to the Apache Software Foundatio
kaivalnp commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1935099798
##
lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/FaissKnnVectorsWriter.java:
##
@@ -0,0 +1,204 @@
+/*
+ * Licensed to the Apache Software Foundatio
kaivalnp commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1935086492
##
lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/FaissKnnVectorsFormat.java:
##
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation
navneet1v commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1934885034
##
lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/LibFaissC.java:
##
@@ -0,0 +1,268 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) und
navneet1v commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1934885034
##
lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/LibFaissC.java:
##
@@ -0,0 +1,268 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) und
navneet1v commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1934885034
##
lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/LibFaissC.java:
##
@@ -0,0 +1,268 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) und
mikemccand commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2623225740
> > should report total CPU cycles consumed during indexing and searching
(summed across all threads)...
>
> @mikemccand that would help these higher level multithreaded perform
mikemccand commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2623221564
> If I have learned one thing over the years, it's that benchmarking
accurately is very difficult!
Amen to that!!
--
This is an automated message from the Apache Git Service.
navneet1v commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1934818584
##
lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/LibFaissC.java:
##
@@ -0,0 +1,268 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) und
navneet1v commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1934791780
##
lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/FaissKnnVectorsWriter.java:
##
@@ -0,0 +1,204 @@
+/*
+ * Licensed to the Apache Software Foundati
kaivalnp commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2622946390
> FAISS with this vector dimension does seem about 20% faster at search
I should add here that Lucene was using vectorized instructions via Panama,
but the C_API of Faiss was not.
benwtrent commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2622885089
> number of vector operations that FAISS does during search.
By this, I mean the number of vectors it must visit when searching the graph.
--
This is an automated message from
benwtrent commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2622880879
@kaivalnp 😌
I was worried that we had some serious outstanding performance bug that has
been missed in Lucene!
Conceptually, it makes sense that the performance of buildi
kaivalnp commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2622861110
@benwtrent Thanks for the input! I tried what you mentioned above:
> I would reduce the number of indexing threads to 1, faiss threads to 1,
and merge workers to 1
Lucene:
`
benwtrent commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2622798641
> should report total CPU cycles consumed during indexing and searching
(summed across all threads)...
@mikemccand that would help these higher level multithreaded performance
mikemccand commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2622774476
Really, `luceneutil` should report total CPU cycles consumed during indexing
and searching (summed across all threads)... I'll open an issue for this.
--
This is an automated messag
benwtrent commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2622749459
@kaivalnp the force-merge time indicates that during merge to a single
segment, the index is being rebuilt from various segments. I would think that
the `force-merge` time itself is mo
kaivalnp commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2622613947
Ah I see :)
> The force merge is the time it took to merge the created segments into 1
Does it mean that the Faiss benchmark created a larger number of segments
initially,
jimczi commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2622578687
> Not as high as 10x anymore, but it is still ~3x faster
Not so easy ;) See the force merge time for Faiss (41.44 s). The force merge
is the time it took to merge the created segmen
kaivalnp commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2622538569
> Since Faiss uses multithreading by default, we cannot compare with Lucene
Ah nice catch, the number of threads used by both may be different..
I'm not sure how many thread
jimczi commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2622455427
> Almost 10x indexing throughput improvement tells me we are doing something
silly in Lucene.
I did not test this specific integration but Faiss is multithreaded on bulk
training,
kaivalnp commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2621943260
> Maybe it can be just as fast by not reading the floating point vectors on
to heap and doing memory segment stuff
Interesting, do we have a Lucene PR that explores it?
> D
benwtrent commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2621638248
Some very interesting numbers @kaivalnp
Almost 10x indexing throughput improvement tells me we are doing something
silly in Lucene. Especially since the search time is only about
kaivalnp commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2621529365
### Usage
The new format can be used by:
- "Describing" the index you want, see
https://github.com/facebookresearch/faiss/wiki/The-index-factory
- Setting index parameters,
kaivalnp commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2621481909
### Description
1. Separate Faiss indexes are maintained per-segment per-field, in line with
Lucene's architecture (and the current vector format)
2. Vectors are buffered in memory
kaivalnp opened a new pull request, #14178:
URL: https://github.com/apache/lucene/pull/14178
### Description
Faiss (https://github.com/facebookresearch/faiss) is _"a library for
efficient similarity search and clustering of dense vectors"_
It supports various features like vect
63 matches
Mail list logo