benwtrent commented on code in PR #13910: URL: https://github.com/apache/lucene/pull/13910#discussion_r1803132931
########## lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseKnnVectorsFormatTestCase.java: ########## @@ -1906,4 +1916,122 @@ public void testMismatchedFields() throws Exception { IOUtils.close(reader, w2, dir1, dir2); } + + /** + * Test that the query is a viable approximation to exact search. This test is designed to uncover + * gross failures only, not to represent the true expected recall. + */ + public void testRecall() throws IOException { + VectorSimilarityFunction vectorSimilarityFunction = VectorSimilarityFunction.EUCLIDEAN; + int dim = 16; + try (Directory indexStore = getKnownIndexStore("field", dim, vectorSimilarityFunction); + IndexReader reader = DirectoryReader.open(indexStore)) { + IndexSearcher searcher = newSearcher(reader); + float[] queryEmbedding = new float[dim]; + String queryString = "Apache License"; + computeLineEmbedding(queryString, queryEmbedding); + // computeLineEmbedding(" END OF TERMS AND CONDITIONS", queryEmbedding); + // pass match-all "filter" to force full traversal, bypassing graph + KnnFloatVectorQuery exactQuery = + new KnnFloatVectorQuery("field", queryEmbedding, 1000, new MatchAllDocsQuery()); Review Comment: Also, I think for more consistent runs, we may want to have multiple query embeddings that we test with and gather `min` `max` and `avg` recalls. But this can be a further refinement on this work. I just think having a single query might be very flaky in the long run. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org