benwtrent commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2621638248

   Some very interesting numbers @kaivalnp
   
   Almost 10x indexing throughput improvement tells me we are doing something 
silly in Lucene. Especially since the search time is only about 25% better. 
   
   The search time numbers make me wonder if the differential is mainly that 
reads the floats onto heap. Maybe it can be just as fast by not reading the 
floating point vectors on to heap and doing memory segment stuff (which gets 
tricky, but not impossible).
   
   Does FAISS index needs the "flat" vector storage at all? I thought FAISS 
gave direct access to the vector values based on ordinals? Or do you have to 
index it in a special way? 
   
   I can try to replicate the performance numbers when I can. 
   
   One thing that stands out to me, is that during merge, all vectors are 
buffered onto heap, which is pretty dang expensive :/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to