msokolov commented on code in PR #13181: URL: https://github.com/apache/lucene/pull/13181#discussion_r1587804525
########## lucene/core/src/java/org/apache/lucene/util/quantization/QuantizedByteVectorValues.java: ########## @@ -18,13 +18,40 @@ import java.io.IOException; import org.apache.lucene.index.ByteVectorValues; +import org.apache.lucene.search.DocIdSetIterator; +import org.apache.lucene.search.VectorScorer; /** * A version of {@link ByteVectorValues}, but additionally retrieving score correction offset for * Scalar quantization scores. * * @lucene.experimental */ -public abstract class QuantizedByteVectorValues extends ByteVectorValues { +public abstract class QuantizedByteVectorValues extends DocIdSetIterator { Review Comment: Sorry I haven't kept up with all the exciting developments -- how does one specify quantization -- it's part of `FieldInfo` right? In that case does this class/interface need to be public? wouldn't VectorValues just "do the right thing" as regards quantization if it's specified for a field? Or ... are we trying to allow two-pass scoring where we coarsely use quantized score and then later refine with full-precision scoring? ########## lucene/core/src/java/org/apache/lucene/util/quantization/QuantizedByteVectorValues.java: ########## @@ -18,13 +18,40 @@ import java.io.IOException; import org.apache.lucene.index.ByteVectorValues; +import org.apache.lucene.search.DocIdSetIterator; +import org.apache.lucene.search.VectorScorer; /** * A version of {@link ByteVectorValues}, but additionally retrieving score correction offset for * Scalar quantization scores. * * @lucene.experimental */ -public abstract class QuantizedByteVectorValues extends ByteVectorValues { +public abstract class QuantizedByteVectorValues extends DocIdSetIterator { Review Comment: Overall the idea of returning a `VectorScorer` from a `VectorValues` makes sense to me. One use case I'm not sure we support (and we should) is a vector field with no associated HNSW graph where we want to rank documents by their vector score "brute force". We would still want access to all the other goodies like quantization, predefined scorers, etc. Can we even create a vector field with no graph? If we do that would there be a `VectorSimilarity` associated with the field? ########## lucene/core/src/java/org/apache/lucene/search/VectorScorer.java: ########## @@ -18,64 +18,39 @@ import java.io.IOException; import org.apache.lucene.index.ByteVectorValues; -import org.apache.lucene.index.FieldInfo; import org.apache.lucene.index.FloatVectorValues; -import org.apache.lucene.index.LeafReaderContext; import org.apache.lucene.index.VectorSimilarityFunction; /** * Computes the similarity score between a given query vector and different document vectors. This - * is primarily used by {@link KnnFloatVectorQuery} to run an exact, exhaustive search over the - * vectors. + * is used for exact searching and scoring + * + * @lucene.experimental */ -abstract class VectorScorer { - protected final VectorSimilarityFunction similarity; +public interface VectorScorer { /** - * Create a new vector scorer instance. + * Compute the score for the current document ID. * - * @param context the reader context - * @param fi the FieldInfo for the field containing document vectors - * @param query the query vector to compute the similarity for + * @return the score for the current document ID + * @throws IOException if an exception occurs during score computation */ - static FloatVectorScorer create(LeafReaderContext context, FieldInfo fi, float[] query) - throws IOException { - FloatVectorValues values = context.reader().getFloatVectorValues(fi.name); - if (values == null) { - FloatVectorValues.checkField(context.reader(), fi.name); - return null; - } - final VectorSimilarityFunction similarity = fi.getVectorSimilarityFunction(); - return new FloatVectorScorer(values, query, similarity); - } - - static ByteVectorScorer create(LeafReaderContext context, FieldInfo fi, byte[] query) - throws IOException { - ByteVectorValues values = context.reader().getByteVectorValues(fi.name); - if (values == null) { - ByteVectorValues.checkField(context.reader(), fi.name); - return null; - } - VectorSimilarityFunction similarity = fi.getVectorSimilarityFunction(); - return new ByteVectorScorer(values, query, similarity); - } - - VectorScorer(VectorSimilarityFunction similarity) { - this.similarity = similarity; - } + float score() throws IOException; - /** Compute the similarity score for the current document. */ - abstract float score() throws IOException; - - abstract boolean advanceExact(int doc) throws IOException; + /** + * @return a {@link DocIdSetIterator} over the documents. + */ + DocIdSetIterator iterator(); Review Comment: Your message, @benwtrent, said in this scheme iterators create scorers, but here it looks like the "normal" way? ########## lucene/core/src/java/org/apache/lucene/search/VectorScorer.java: ########## @@ -18,64 +18,39 @@ import java.io.IOException; import org.apache.lucene.index.ByteVectorValues; -import org.apache.lucene.index.FieldInfo; import org.apache.lucene.index.FloatVectorValues; -import org.apache.lucene.index.LeafReaderContext; import org.apache.lucene.index.VectorSimilarityFunction; /** * Computes the similarity score between a given query vector and different document vectors. This - * is primarily used by {@link KnnFloatVectorQuery} to run an exact, exhaustive search over the - * vectors. + * is used for exact searching and scoring + * + * @lucene.experimental */ -abstract class VectorScorer { - protected final VectorSimilarityFunction similarity; +public interface VectorScorer { /** - * Create a new vector scorer instance. + * Compute the score for the current document ID. * - * @param context the reader context - * @param fi the FieldInfo for the field containing document vectors - * @param query the query vector to compute the similarity for + * @return the score for the current document ID + * @throws IOException if an exception occurs during score computation */ - static FloatVectorScorer create(LeafReaderContext context, FieldInfo fi, float[] query) - throws IOException { - FloatVectorValues values = context.reader().getFloatVectorValues(fi.name); - if (values == null) { - FloatVectorValues.checkField(context.reader(), fi.name); - return null; - } - final VectorSimilarityFunction similarity = fi.getVectorSimilarityFunction(); - return new FloatVectorScorer(values, query, similarity); - } - - static ByteVectorScorer create(LeafReaderContext context, FieldInfo fi, byte[] query) - throws IOException { - ByteVectorValues values = context.reader().getByteVectorValues(fi.name); - if (values == null) { - ByteVectorValues.checkField(context.reader(), fi.name); - return null; - } - VectorSimilarityFunction similarity = fi.getVectorSimilarityFunction(); - return new ByteVectorScorer(values, query, similarity); - } - - VectorScorer(VectorSimilarityFunction similarity) { - this.similarity = similarity; - } + float score() throws IOException; - /** Compute the similarity score for the current document. */ - abstract float score() throws IOException; - - abstract boolean advanceExact(int doc) throws IOException; + /** + * @return a {@link DocIdSetIterator} over the documents. + */ + DocIdSetIterator iterator(); Review Comment: But I guess the `VectorValues` will return ` VectorScorer`? In that case do we need this `iterator()` method since presumably the caller already has an iterator over docs with vectors in this field? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org