jmazanec15 commented on issue #12615:
URL: https://github.com/apache/lucene/issues/12615#issuecomment-1747329967

   A hybrid disk-memory algorithm would have very strong benefits. I did run a 
few tests recently that confirmed HNSW does not function very well when memory 
gets constrained (which I think everyone already knew). 
   
   I wonder though, instead of DiskANN, what about a partitioning based 
approach such as [SPANN](https://arxiv.org/pdf/2111.08566.pdf)? I think a 
partitioning based approach for Lucene would make merging, updating, filtering 
and indexing a lot easier. Also, it seems it would have better disk-access 
patterns. In the paper, they do show that in a memory constrained environment, 
they were able to outperfrom DiskANN.
   
   I guess the tradeoff might be that partitioning based approaches would 
struggle to achieve really low latency search when in memory compared to 
graph-based approaches. Additionally, partitioning approaches would require a 
potentially expensive "training" or "preprocessing" step such as k-Means and 
performance could degrade if data distribution drifts and the models are not 
updated. But, if PQ is implemented, the same considerations would need to be 
taken as well.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to