Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

via GitHub Mon, 13 Nov 2023 08:20:17 -0800


jbellis commented on issue #12615:
URL: https://github.com/apache/lucene/issues/12615#issuecomment-1808489834


   > recall actually improves when introducing pq, and only starts to decrease 
at a factor of 16
   
   I would guess that either there is a bug or you happen to be testing with a 
really unusual dataset.  PQ is fundamentally a lossy compression and can't 
magically create similarity that didn't exist in the original.
   
   > Regardless, is there any oversampling that is occurring when PQ is enabled 
in JVector?
   
   Today it's up to the caller (so on the Cassandra side, in Astra) but it's 
possible that it should move into JVector.
   
   > Additionally the graph building probably cannot be done with the PQ'd 
vectors.
   
   I suppose it's not impossible that you could compress first and then build 
but I have not seen anyone do it yet.  JVector follows DiskANN's lead and 
builds the graph using uncompressed vectors. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

Reply via email to