msokolov commented on issue #13565:
URL: https://github.com/apache/lucene/issues/13565#issuecomment-2301804631

   In the meantime, just to let you know I do have a dirt path implementation 
of this (multithreading not yet working, totally recomputes centroids on every 
iteration, etc), but it isn't yet yielding the hoped-for improvements in hnsw 
graph (vex file) size. I augmented KnnGraphTester to print out the average 
delta between node ids, and this is being cut in half, as we would expect from 
the BP, but it doesn't yield much reduction in index size. It might just be 
that the indexes are too small for VInt encoding to be impacted much if 
node/docid deltas were previously averaging around 55000 and are now around 
25000  (still takes 3 bytes per delta on average). In this case I saw vex file 
size go from 21048922 to 20624822; only a few % reduction. I'm continuing to 
test with larger indexes and different vector data sets.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to