jpountz commented on a change in pull request #262: URL: https://github.com/apache/lucene/pull/262#discussion_r696363685
########## File path: lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/SimpleTextKnnVectorsReader.java ########## @@ -140,7 +147,38 @@ public VectorValues getVectorValues(String field) throws IOException { @Override public TopDocs search(String field, float[] target, int k, Bits acceptDocs) throws IOException { - throw new UnsupportedOperationException(); + VectorValues values = getVectorValues(field); + if (values == null) { + return null; + } + if (target.length != values.dimension()) { + throw new IllegalArgumentException( + "incorrect dimension for field " + + field + + "; expected " + + values.dimension() + + " but target has " + + target.length); + } + FieldInfo info = readState.fieldInfos.fieldInfo(field); + VectorSimilarityFunction vectorSimilarity = info.getVectorSimilarityFunction(); + HitQueue topK = new HitQueue(k, false); + int doc; + while ((doc = values.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) { + float[] vector = values.vectorValue(); + float score = vectorSimilarity.compare(vector, target); + if (vectorSimilarity.reversed) { + score = 1 / (score + 1); + } + topK.insertWithOverflow(new ScoreDoc(doc, score)); + } + ScoreDoc[] topScoreDocs = new ScoreDoc[topK.size()]; + int i = 0; + for (ScoreDoc scoreDoc : topK) { + topScoreDocs[i++] = scoreDoc; + } + Arrays.sort(topScoreDocs, Comparator.comparingInt(x -> x.doc)); Review comment: I would have expected this method to return vectors by descending score. This makes me curious about what the exact contract of this method is regarding the order of the hits. If it is unspecified, maybe we should make it clearer in javadocs? ########## File path: lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/SimpleTextKnnVectorsReader.java ########## @@ -140,7 +147,38 @@ public VectorValues getVectorValues(String field) throws IOException { @Override public TopDocs search(String field, float[] target, int k, Bits acceptDocs) throws IOException { - throw new UnsupportedOperationException(); + VectorValues values = getVectorValues(field); Review comment: Our general approach for these problems is that the `XXXReader`/`XXXProducer` classes can assume that their methods are only called on fields that have the feature enabled according to `FieldInfos`, and it's the responsibility of `CodecReader` to check `FieldInfos` before forwarding calls to `XXXReader`/`XXXProducer` classes. This seems to already be done correctly in `CodecReader#searchNearestVectors`. So I think we are good, and we should even remove the `if (values == null)` check below, which is not necessary and might hide bugs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org