benwtrent commented on code in PR #13181:
URL: https://github.com/apache/lucene/pull/13181#discussion_r1587957013


##########
lucene/core/src/java/org/apache/lucene/util/quantization/QuantizedByteVectorValues.java:
##########
@@ -18,13 +18,40 @@
 
 import java.io.IOException;
 import org.apache.lucene.index.ByteVectorValues;
+import org.apache.lucene.search.DocIdSetIterator;
+import org.apache.lucene.search.VectorScorer;
 
 /**
  * A version of {@link ByteVectorValues}, but additionally retrieving score 
correction offset for
  * Scalar quantization scores.
  *
  * @lucene.experimental
  */
-public abstract class QuantizedByteVectorValues extends ByteVectorValues {
+public abstract class QuantizedByteVectorValues extends DocIdSetIterator {

Review Comment:
   > it's part of FieldInfo right?
   
   Its part of the Codec.
   
   > Or ... are we trying to allow two-pass scoring where we coarsely use 
quantized score and then later refine with full-precision scoring?
   
   We already allow this. Approximate search is done over quantized and then 
folks can rescore later by iterating the raw floats and scoring however they 
want (un-quantized). 
   
   >  One use case I'm not sure we support (and we should) is a vector field 
with no associated HNSW graph where we want to rank documents by their vector 
score "brute force".
   
   This is the sort of use-case I have in mind here as well. Somebody creating 
their own "Flat" codec is pretty simple given we have separated out the flat 
vector storage & quantization from the HNSW graph. But, we don't provide a nice 
way for them to handle exact search. Instead they must override the approximate 
search API and do an a "top-k" over flat vectors. This requires iterating 
everything upfront and finding top-k. Instead, they should be able to return a 
scorer/iterator that can be used in a query.
   
   The other use case is when we do exact search instead of approximate search 
due to filtering criteria. Right now we drop down to score the raw vectors 
instead of the quantized ones, that doesn't make sense to me. 
   
   If somebody wants to rescore or search the raw vectors, they can already do 
that by iterating the `Float|ByteVectorValues` that they provided.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to