benwtrent commented on PR #13027:
URL: https://github.com/apache/lucene/pull/13027#issuecomment-1904192755

   @jpountz 
   
   > I see that it needs to linearly scan all vectors anyway, so this shouldn't 
come at a performance penalty?
   
   We would then need to linearly scan all the vectors twice. Because we do not 
have random access, we can only sample while iterating directly and we don't 
want to read&copy in vectors that ultimately don't get sample anyways (this 
would happen if we randomly sampled while iterating).
   
   So, we would need to iterate them twice:
   
    - Once to get the size
    - Then again to sample the vectors
   
   Instead of passing in two iterators, passing in a live vector count seemed 
more sensible as this is already provided.
   
   
   
   Adding an `@Override` to `MergedVectorValues` that asserts `size()` is never 
called is a good protection idea.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to