msokolov commented on issue #13565: URL: https://github.com/apache/lucene/issues/13565#issuecomment-2301804631
In the meantime, just to let you know I do have a dirt path implementation of this (multithreading not yet working, totally recomputes centroids on every iteration, etc), but it isn't yet yielding the hoped-for improvements in hnsw graph (vex file) size. I augmented KnnGraphTester to print out the average delta between node ids, and this is being cut in half, as we would expect from the BP, but it doesn't yield much reduction in index size. It might just be that the indexes are too small for VInt encoding to be impacted much if node/docid deltas were previously averaging around 55000 and are now around 25000 (still takes 3 bytes per delta on average). In this case I saw vex file size go from 21048922 to 20624822; only a few % reduction. I'm continuing to test with larger indexes and different vector data sets. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org