benwtrent commented on PR #13651:
URL: https://github.com/apache/lucene/pull/13651#issuecomment-2464679186
@ShashwatShivam I don't think there is a "memory column" provided anywhere.
I simply looked at the individual file sizes (veb, vex) and summed their sizes
together.
--
This is an au
ShashwatShivam commented on PR #13651:
URL: https://github.com/apache/lucene/pull/13651#issuecomment-2463415182
@benwtrent makes sense, I wasn't accounting for the fact that the floating
vectors are being stored too. I guess I should have instead asked how to
reproduce the 'memory required'
benwtrent commented on PR #13651:
URL: https://github.com/apache/lucene/pull/13651#issuecomment-2462723469
@ShashwatShivam why do you think the index size (total size of all the
files) should be smaller?
We store the binary quantized vectors and the floating point vectors. So, I
woul
ShashwatShivam commented on PR #13651:
URL: https://github.com/apache/lucene/pull/13651#issuecomment-2462601593
@benwtrent thanks for giving the link to the testing script, it works! One
question - the index size it reports is larger than the HNSW index size. For
e.g. I was working with a C
benwtrent commented on PR #13651:
URL: https://github.com/apache/lucene/pull/13651#issuecomment-2457031733
Hey @ShashwatShivam
https://github.com/mikemccand/luceneutil/compare/main...benwtrent:luceneutil:bbq
that is the testing script I use.
But if Lucene has since been update
ShashwatShivam commented on PR #13651:
URL: https://github.com/apache/lucene/pull/13651#issuecomment-2456984008
Hi Ben, I'm trying to get a benchmark run for RaBitQ using luceneutil
(https://github.com/mikemccand/luceneutil), but I'm facing some missing files
issue - java.lang.NoClassDefFou
mayya-sharipova commented on code in PR #13651:
URL: https://github.com/apache/lucene/pull/13651#discussion_r1823251497
##
lucene/core/src/java/org/apache/lucene/codecs/lucene101/Lucene101BinaryQuantizedVectorsFormat.java:
##
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Soft
benwtrent commented on PR #13651:
URL: https://github.com/apache/lucene/pull/13651#issuecomment-2427907718
Here is some Lucene Util Benchmarking. Some of these numbers actually
contradict some of my previous benchmarking for int4. Which is frustrating, I
wonder what I did wrong then or now.
benwtrent commented on PR #13651:
URL: https://github.com/apache/lucene/pull/13651#issuecomment-2423186168
I will open a PR against Lucene Util to update it to utilize these formats
and show y'all some runs with it soon. But The PR is ready for general review.
--
This is an automated mess
benwtrent commented on PR #13651:
URL: https://github.com/apache/lucene/pull/13651#issuecomment-2417284114
I am currently working on moving this to Lucene101 format with the bug fixes
we discovered in additional testing.
--
This is an automated message from the Apache Git Service.
To res
benwtrent commented on PR #13651:
URL: https://github.com/apache/lucene/pull/13651#issuecomment-2359272883
Here is some more flat index test results. This was to exercise and see how
the number of coarse grained centroids changes recall & speed.
| Lucene912BinaryQuantizedVectorsForma
benwtrent commented on PR #13651:
URL: https://github.com/apache/lucene/pull/13651#issuecomment-2356359326
@ShashwatShivam so, the flat codec version is sneaky, depending on when you
cloned the repo, it might not be doing anything
Lucene by default will return nothing for approx
ShashwatShivam commented on PR #13651:
URL: https://github.com/apache/lucene/pull/13651#issuecomment-2356278997
Following up on the above comment by tanyaroosta, the dataset I was using
for benchmarking RaBitQ through Luceneutil (main branch) was amazon's ASIN and
query embeddings (which ar
benwtrent commented on PR #13651:
URL: https://github.com/apache/lucene/pull/13651#issuecomment-2356243291
@tanyaroosta we are still doing larger scale testing, but if you want to
test with LuceneUtil, here is the branch I am using:
https://github.com/mikemccand/luceneutil/compare/main...be
tanyaroosta commented on PR #13651:
URL: https://github.com/apache/lucene/pull/13651#issuecomment-2356189954
@benwtrent we are trying to run tests with the RaBitQ Lucene implementation,
and are not able to replicate the numbers reported in the paper. Have you run
tests as part of the imple
john-wagster commented on code in PR #13651:
URL: https://github.com/apache/lucene/pull/13651#discussion_r1742666476
##
lucene/core/src/java/org/apache/lucene/codecs/lucene912/BinarizedByteVectorValues.java:
##
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation
benwtrent commented on code in PR #13651:
URL: https://github.com/apache/lucene/pull/13651#discussion_r1742505038
##
lucene/core/src/java/org/apache/lucene/codecs/lucene912/BinarizedByteVectorValues.java:
##
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (AS
benwtrent commented on code in PR #13651:
URL: https://github.com/apache/lucene/pull/13651#discussion_r1742505038
##
lucene/core/src/java/org/apache/lucene/codecs/lucene912/BinarizedByteVectorValues.java:
##
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (AS
benwtrent commented on code in PR #13651:
URL: https://github.com/apache/lucene/pull/13651#discussion_r1731850236
##
lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912BinaryFlatVectorsScorer.java:
##
@@ -0,0 +1,317 @@
+/*
+ * Licensed to the Apache Software Founda
benwtrent commented on code in PR #13651:
URL: https://github.com/apache/lucene/pull/13651#discussion_r1731849841
##
lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912BinaryFlatVectorsScorer.java:
##
@@ -0,0 +1,317 @@
+/*
+ * Licensed to the Apache Software Founda
mayya-sharipova commented on code in PR #13651:
URL: https://github.com/apache/lucene/pull/13651#discussion_r1731836389
##
lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912BinaryFlatVectorsScorer.java:
##
@@ -0,0 +1,317 @@
+/*
+ * Licensed to the Apache Software
mayya-sharipova commented on code in PR #13651:
URL: https://github.com/apache/lucene/pull/13651#discussion_r1731835612
##
lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912BinaryFlatVectorsScorer.java:
##
@@ -0,0 +1,317 @@
+/*
+ * Licensed to the Apache Software
rmuir commented on code in PR #13651:
URL: https://github.com/apache/lucene/pull/13651#discussion_r1731609986
##
lucene/core/src/java21/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java:
##
@@ -761,4 +763,81 @@ private static int squareDistanceBody128(MemoryS
ChrisHegarty commented on code in PR #13651:
URL: https://github.com/apache/lucene/pull/13651#discussion_r1731499127
##
lucene/core/src/java21/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java:
##
@@ -761,4 +763,81 @@ private static int squareDistanceBody128(
ChrisHegarty commented on code in PR #13651:
URL: https://github.com/apache/lucene/pull/13651#discussion_r1731488489
##
lucene/core/src/java21/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java:
##
@@ -761,4 +763,81 @@ private static int squareDistanceBody128(
ChrisHegarty commented on code in PR #13651:
URL: https://github.com/apache/lucene/pull/13651#discussion_r1731488489
##
lucene/core/src/java21/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java:
##
@@ -761,4 +763,81 @@ private static int squareDistanceBody128(
rmuir commented on code in PR #13651:
URL: https://github.com/apache/lucene/pull/13651#discussion_r1731175737
##
lucene/core/src/java21/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java:
##
@@ -761,4 +763,81 @@ private static int squareDistanceBody128(MemoryS
rmuir commented on code in PR #13651:
URL: https://github.com/apache/lucene/pull/13651#discussion_r1731174206
##
lucene/core/src/java21/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java:
##
@@ -761,4 +763,81 @@ private static int squareDistanceBody128(MemoryS
benwtrent commented on PR #13651:
URL: https://github.com/apache/lucene/pull/13651#issuecomment-2302857403
100MB assumes that even when compressed, it's a single byte per centroid.
100M vectors might only have 2 centroids and thus only need two bits two store.
Also, I would expect the
mayya-sharipova commented on PR #13651:
URL: https://github.com/apache/lucene/pull/13651#issuecomment-2302685585
> possibly switch to LongValues for storing vectorOrd -> centroidOrd mapping
I was thinking about adding centroids mappings as LongValues at the end of
meta file, but this
benwtrent opened a new pull request, #13651:
URL: https://github.com/apache/lucene/pull/13651
# Not only a draft, but a very rough one indeed
Not opening for the sake of review, but just openness and for those curious
about the work.
# Highlevel design
RaBitQ is basicall
31 matches
Mail list logo