benwtrent commented on PR #13651:
URL: https://github.com/apache/lucene/pull/13651#issuecomment-2356243291

   @tanyaroosta we are still doing larger scale testing, but if you want to 
test with LuceneUtil, here is the branch I am using: 
https://github.com/mikemccand/luceneutil/compare/main...benwtrent:luceneutil:bbq?expand=1
   
   2-5x oversampling is required with HNSW to get good recall. 
   
   The original paper does periodic rescoring with the raw vectors which 
effectively means you need random access with raw vectors, even when NOT using 
HNSW. This just doesn't scale when trying to reduce memory usage.
   
   Instead, we are opting for a more traditional approach of oversampling and 
reranking.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to