vigyasharma commented on code in PR #14708: URL: https://github.com/apache/lucene/pull/14708#discussion_r2118820224
########## lucene/core/src/java/org/apache/lucene/search/ByteVectorSimilarityValuesSource.java: ########## @@ -42,7 +78,35 @@ public VectorScorer getScorer(LeafReaderContext ctx) throws IOException { ByteVectorValues.checkField(ctx.reader(), fieldName); return null; } - return vectorValues.scorer(queryVector); + final FieldInfo fi = ctx.reader().getFieldInfos().fieldInfo(fieldName); + if (fi.getVectorDimension() != queryVector.length) { + throw new IllegalArgumentException( + "Query vector dimension does not match field dimension: " + + queryVector.length + + " != " + + fi.getVectorDimension()); + } + + // default vector scorer + if (useFullPrecision == false) { + return vectorValues.scorer(queryVector); + } + + final VectorSimilarityFunction vectorSimilarityFunction = fi.getVectorSimilarityFunction(); + return new VectorScorer() { + final KnnVectorValues.DocIndexIterator iterator = vectorValues.iterator(); + + @Override + public float score() throws IOException { + return vectorSimilarityFunction.compare( + queryVector, vectorValues.vectorValue(iterator.index())); Review Comment: I do remember reading some results on the DiskANN issue where benchmarks indicated that having the vectors needed for ANN graph search in memory (the quantized vectors in this case), does lead to better performance. So maybe, an option to use only DIRECT_IO for this makes sense. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org