jbellis commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-1808489834
> recall actually improves when introducing pq, and only starts to decrease at a factor of 16 I would guess that either there is a bug or you happen to be testing with a really unusual dataset. PQ is fundamentally a lossy compression and can't magically create similarity that didn't exist in the original. > Regardless, is there any oversampling that is occurring when PQ is enabled in JVector? Today it's up to the caller (so on the Cassandra side, in Astra) but it's possible that it should move into JVector. > Additionally the graph building probably cannot be done with the PQ'd vectors. I suppose it's not impossible that you could compress first and then build but I have not seen anyone do it yet. JVector follows DiskANN's lead and builds the graph using uncompressed vectors. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org