jpountz commented on PR #13027: URL: https://github.com/apache/lucene/pull/13027#issuecomment-1904174622
Good catch. I wonder what is the best place to compute size() correctly. I see you fixed the merge instances, but this is not how it's done elsewhere, see e.g. `FieldsConsumer#write`, which clearly states that index statistics must not be pulled from the merged `Terms` instance but recomputed. Should we follow a similar approach and re-compute the size in `ScalarQuantizer#fromVectors`? I see that it needs to linearly scan all vectors anyway, so this shouldn't come at a performance penalty? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org