jpountz commented on a change in pull request #262:
URL: https://github.com/apache/lucene/pull/262#discussion_r696363685



##########
File path: 
lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/SimpleTextKnnVectorsReader.java
##########
@@ -140,7 +147,38 @@ public VectorValues getVectorValues(String field) throws 
IOException {
 
   @Override
   public TopDocs search(String field, float[] target, int k, Bits acceptDocs) 
throws IOException {
-    throw new UnsupportedOperationException();
+    VectorValues values = getVectorValues(field);
+    if (values == null) {
+      return null;
+    }
+    if (target.length != values.dimension()) {
+      throw new IllegalArgumentException(
+          "incorrect dimension for field "
+              + field
+              + "; expected "
+              + values.dimension()
+              + " but target has "
+              + target.length);
+    }
+    FieldInfo info = readState.fieldInfos.fieldInfo(field);
+    VectorSimilarityFunction vectorSimilarity = 
info.getVectorSimilarityFunction();
+    HitQueue topK = new HitQueue(k, false);
+    int doc;
+    while ((doc = values.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) {
+      float[] vector = values.vectorValue();
+      float score = vectorSimilarity.compare(vector, target);
+      if (vectorSimilarity.reversed) {
+        score = 1 / (score + 1);
+      }
+      topK.insertWithOverflow(new ScoreDoc(doc, score));
+    }
+    ScoreDoc[] topScoreDocs = new ScoreDoc[topK.size()];
+    int i = 0;
+    for (ScoreDoc scoreDoc : topK) {
+      topScoreDocs[i++] = scoreDoc;
+    }
+    Arrays.sort(topScoreDocs, Comparator.comparingInt(x -> x.doc));

Review comment:
       I would have expected this method to return vectors by descending score. 
This makes me curious about what the exact contract of this method is regarding 
the order of the hits. If it is unspecified, maybe we should make it clearer in 
javadocs?

##########
File path: 
lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/SimpleTextKnnVectorsReader.java
##########
@@ -140,7 +147,38 @@ public VectorValues getVectorValues(String field) throws 
IOException {
 
   @Override
   public TopDocs search(String field, float[] target, int k, Bits acceptDocs) 
throws IOException {
-    throw new UnsupportedOperationException();
+    VectorValues values = getVectorValues(field);

Review comment:
       Our general approach for these problems is that the 
`XXXReader`/`XXXProducer` classes can assume that their methods are only called 
on fields that have the feature enabled according to `FieldInfos`, and it's the 
responsibility of `CodecReader` to check `FieldInfos` before forwarding calls 
to `XXXReader`/`XXXProducer` classes. This seems to already be done correctly 
in `CodecReader#searchNearestVectors`.
   
   So I think we are good, and we should even remove the `if (values == null)` 
check below, which is not necessary and might hide bugs.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to