mikemccand commented on issue #13565: URL: https://github.com/apache/lucene/issues/13565#issuecomment-2244785292
> This is what got me to thinking of BP for HNSW search: intuitively, it could help a lot when the dataset size exceeds the size of the page cache? I think that gains might be astounding? Similar vectors would be stored near each other in the `.vec` / `.veq` file, so paging in larger blocks / OS readahead could be very effective (though we may have to turn off `MADV_RANDOM` and see if it helps). It should also mean less broad exploration of the graph: once you find your "neighborhood" of similar-ish vectors you spend some effort there and more quickly get to the top K. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org