Re: [PR] Fix NPE when sampling for quantization in Lucene99HnswScalarQuantizedVectorsFormat [lucene]

via GitHub Mon, 22 Jan 2024 06:54:55 -0800


jpountz commented on PR #13027:
URL: https://github.com/apache/lucene/pull/13027#issuecomment-1904174622


   Good catch. I wonder what is the best place to compute size() correctly. I 
see you fixed the merge instances, but this is not how it's done elsewhere, see 
e.g. `FieldsConsumer#write`, which clearly states that index statistics must 
not be pulled from the merged `Terms` instance but recomputed. Should we follow 
a similar approach and re-compute the size in `ScalarQuantizer#fromVectors`? I 
see that it needs to linearly scan all vectors anyway, so this shouldn't come 
at a performance penalty?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Fix NPE when sampling for quantization in Lucene99HnswScalarQuantizedVectorsFormat [lucene]

Reply via email to