[GitHub] [lucene] benwtrent opened a new pull request, #11860: GITHUB-11830 Better optimize storage for vector connections

GitBox Tue, 18 Oct 2022 07:57:30 -0700


benwtrent opened a new pull request, #11860:
URL: https://github.com/apache/lucene/pull/11860


   Vector search is much faster when the graph can fit in memory. Consequently, 
improvements in vector storage can translate to faster searches on larger 
graphs.
   
   One area of size reduction is node connections. Currently, they are stored 
as regular `int` values, but per connection there are usually fewer connections 
than required to store in an `int`. 
   
   This commit proposes storing node connections within the graph with 
`PackedInts`. This adds a new codec reader/writer for the HNSW graph. 
   
   Will store node connections with a `PackedInts` stream, using the maximal 
possible value of connections as the upper limit. Additional, the packed ints 
version is written so the reader uses the appropriate PackedInts version when 
reading the data.
   
   
   This change found, on average, a 30% space savings with minimal change in 
query-per-second (QPS).
   
   There are probably even better storage optimization options, if anybody 
knows of such (I am new to the Lucene world), please let me know!
   
   In depth investigation on QPS available here: 
https://github.com/apache/lucene/issues/11830#issuecomment-1279207529
   
   closes: https://github.com/apache/lucene/issues/11830


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] benwtrent opened a new pull request, #11860: GITHUB-11830 Better optimize storage for vector connections

Reply via email to