kaivalnp commented on PR #14178: URL: https://github.com/apache/lucene/pull/14178#issuecomment-3006409251
Hi @HaoSunUber! Faiss supports multiple algorithms, vector transforms, quantizations, etc -- but I've primarily tested the full-precision pure HNSW v/s HNSW implementations of the Faiss and default Lucene codecs for this PR. I mainly ran benchmarks on 300d vectors of the `enwiki` dataset (some recent numbers [here](https://github.com/apache/lucene/pull/14178/#issuecomment-2954723052)) -- where the single segment search time was \~20% faster. The codec makes it possible to create different indexes (like say, scalar quantized, HNSW+PQ, etc) using different factory strings, see https://github.com/facebookresearch/faiss/wiki/The-index-factory -- but I haven't had a chance to test many others! I've tried to add steps to [install Faiss and make it available to the codec](https://github.com/apache/lucene/blob/4b47fb1a3113d22bca6cd8c1664529ef2d7f4877/lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/faiss/package-info.java#L36-L47) -- after which you can run KNN benchmarks using the https://github.com/mikemccand/luceneutil package (will need minor changes to use the format properly [here](https://github.com/mikemccand/luceneutil/blob/779d85551f37d72ef2d328165dd9a91b4bbf1f35/src/main/knn/KnnGraphTester.java#L1290)) Please do post results if you're able to run any benchmarks, or have questions or feedback on the codec! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org