Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

via GitHub Mon, 16 Oct 2023 15:12:27 -0700


uschindler commented on PR #12582:
URL: https://github.com/apache/lucene/pull/12582#issuecomment-1765355363


   > > why do we need a new top-level Codec? The Lucene main file format does 
not change, only the HNSW format was exchanged. Because like ppostingsfornats 
and docvaluesformats, the SPI can detect the format of the HNSW index by 
reading the file and uses SPI to lookup the correct format.
   > 
   > That's a good point. I think we'd need to increment the VERSION_CURRENT of 
the Lucene95HnswVectorsFormat to do the right thing when reading the data and 
we could avoid the new format entirely since it's exactly the same as before 
(assuming that quantisation is disabled by default).
   
   Actually, if the HNSW format has its own SPI name, when reading indexes it 
should be chosen automatically by KNNVectorsFormat.forName(): 
https://lucene.apache.org/core/9_0_0/core/org/apache/lucene/codecs/KnnVectorsFormat.html?is-external=true#forName(java.lang.String)
   
   In short: when top level codec reads the index and opens the vector format 
it would read the SPI name header from file and then load the correct code 
(possibly the actual one or knew from backwards). That's working like that for 
years for postings and docvalues.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

Reply via email to