benwtrent commented on code in PR #13181: URL: https://github.com/apache/lucene/pull/13181#discussion_r1587957013
########## lucene/core/src/java/org/apache/lucene/util/quantization/QuantizedByteVectorValues.java: ########## @@ -18,13 +18,40 @@ import java.io.IOException; import org.apache.lucene.index.ByteVectorValues; +import org.apache.lucene.search.DocIdSetIterator; +import org.apache.lucene.search.VectorScorer; /** * A version of {@link ByteVectorValues}, but additionally retrieving score correction offset for * Scalar quantization scores. * * @lucene.experimental */ -public abstract class QuantizedByteVectorValues extends ByteVectorValues { +public abstract class QuantizedByteVectorValues extends DocIdSetIterator { Review Comment: > it's part of FieldInfo right? Its part of the Codec. > Or ... are we trying to allow two-pass scoring where we coarsely use quantized score and then later refine with full-precision scoring? We already allow this. Approximate search is done over quantized and then folks can rescore later by iterating the raw floats and scoring however they want (un-quantized). > One use case I'm not sure we support (and we should) is a vector field with no associated HNSW graph where we want to rank documents by their vector score "brute force". This is the sort of use-case I have in mind here as well. Somebody creating their own "Flat" codec is pretty simple given we have separated out the flat vector storage & quantization from the HNSW graph. But, we don't provide a nice way for them to handle exact search. Instead they must override the approximate search API and do an a "top-k" over flat vectors. This requires iterating everything upfront and finding top-k. Instead, they should be able to return a scorer/iterator that can be used in a query. The other use case is when we do exact search instead of approximate search due to filtering criteria. Right now we drop down to score the raw vectors instead of the quantized ones, that doesn't make sense to me. If somebody wants to rescore or search the raw vectors, they can already do that by iterating the `Float|ByteVectorValues` that they provided. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org