benwtrent commented on PR #13651: URL: https://github.com/apache/lucene/pull/13651#issuecomment-2356243291
@tanyaroosta we are still doing larger scale testing, but if you want to test with LuceneUtil, here is the branch I am using: https://github.com/mikemccand/luceneutil/compare/main...benwtrent:luceneutil:bbq?expand=1 2-5x oversampling is required with HNSW to get good recall. The original paper does periodic rescoring with the raw vectors which effectively means you need random access with raw vectors, even when NOT using HNSW. This just doesn't scale when trying to reduce memory usage. Instead, we are opting for a more traditional approach of oversampling and reranking. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org