benwtrent opened a new pull request, #11860: URL: https://github.com/apache/lucene/pull/11860
Vector search is much faster when the graph can fit in memory. Consequently, improvements in vector storage can translate to faster searches on larger graphs. One area of size reduction is node connections. Currently, they are stored as regular `int` values, but per connection there are usually fewer connections than required to store in an `int`. This commit proposes storing node connections within the graph with `PackedInts`. This adds a new codec reader/writer for the HNSW graph. Will store node connections with a `PackedInts` stream, using the maximal possible value of connections as the upper limit. Additional, the packed ints version is written so the reader uses the appropriate PackedInts version when reading the data. This change found, on average, a 30% space savings with minimal change in query-per-second (QPS). There are probably even better storage optimization options, if anybody knows of such (I am new to the Lucene world), please let me know! In depth investigation on QPS available here: https://github.com/apache/lucene/issues/11830#issuecomment-1279207529 closes: https://github.com/apache/lucene/issues/11830 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org