jmazanec15 commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-1747329967
A hybrid disk-memory algorithm would have very strong benefits. I did run a few tests recently that confirmed HNSW does not function very well when memory gets constrained (which I think everyone already knew). I wonder though, instead of DiskANN, what about a partitioning based approach such as [SPANN](https://arxiv.org/pdf/2111.08566.pdf)? I think a partitioning based approach for Lucene would make merging, updating, filtering and indexing a lot easier. Also, it seems it would have better disk-access patterns. In the paper, they do show that in a memory constrained environment, they were able to outperfrom DiskANN. I guess the tradeoff might be that partitioning based approaches would struggle to achieve really low latency search when in memory compared to graph-based approaches. Additionally, partitioning approaches would require a potentially expensive "training" or "preprocessing" step such as k-Means and performance could degrade if data distribution drifts and the models are not updated. But, if PQ is implemented, the same considerations would need to be taken as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org