benwtrent commented on PR #14178: URL: https://github.com/apache/lucene/pull/14178#issuecomment-2621638248
Some very interesting numbers @kaivalnp Almost 10x indexing throughput improvement tells me we are doing something silly in Lucene. Especially since the search time is only about 25% better. The search time numbers make me wonder if the differential is mainly that reads the floats onto heap. Maybe it can be just as fast by not reading the floating point vectors on to heap and doing memory segment stuff (which gets tricky, but not impossible). Does FAISS index needs the "flat" vector storage at all? I thought FAISS gave direct access to the vector values based on ordinals? Or do you have to index it in a special way? I can try to replicate the performance numbers when I can. One thing that stands out to me, is that during merge, all vectors are buffered onto heap, which is pretty dang expensive :/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org