benwtrent commented on code in PR #14191: URL: https://github.com/apache/lucene/pull/14191#discussion_r1939664144
########## lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java: ########## @@ -19,11 +19,7 @@ import static org.apache.lucene.search.DocIdSetIterator.NO_MORE_DOCS; import java.io.IOException; -import java.util.ArrayList; -import java.util.Arrays; -import java.util.Comparator; -import java.util.List; -import java.util.Objects; +import java.util.*; Review Comment: lets not use `*` ########## lucene/core/src/java/org/apache/lucene/search/knn/TopKnnCollectorManager.java: ########## @@ -55,8 +57,13 @@ public KnnCollector newCollector( if (globalScoreQueue == null) { return new TopKnnCollector(k, visitedLimit, searchStrategy); } else { - return new MultiLeafKnnCollector( - k, globalScoreQueue, new TopKnnCollector(k, visitedLimit, searchStrategy)); + if (freeze.getAndSet(false)) { + return new MultiLeafKnnCollector( + k, globalScoreQueue, new TopKnnCollector(k, visitedLimit, searchStrategy), false); + } else { Review Comment: I don't think you need to do this "freeze" thing. The only issue was sharing information between threads. A single executor continuing to share information is completely ok. ########## lucene/core/src/java/org/apache/lucene/search/knn/TopKnnCollectorManager.java: ########## @@ -55,8 +57,13 @@ public KnnCollector newCollector( if (globalScoreQueue == null) { return new TopKnnCollector(k, visitedLimit, searchStrategy); } else { - return new MultiLeafKnnCollector( - k, globalScoreQueue, new TopKnnCollector(k, visitedLimit, searchStrategy)); + if (freeze.getAndSet(false)) { + return new MultiLeafKnnCollector( + k, globalScoreQueue, new TopKnnCollector(k, visitedLimit, searchStrategy), false); + } else { Review Comment: For this change, we simply make `MultiLeafKnnCollector` not thread safe at all. We can continue to have this `globalScoreQueue` that is unique per thread and shared between all the collectors, but all the code assumes its within a single thread. This will likely speed things up significantly. ########## lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java: ########## @@ -88,11 +91,64 @@ public Query rewrite(IndexSearcher indexSearcher) throws IOException { getKnnCollectorManager(k, indexSearcher), indexSearcher.getTimeout()); TaskExecutor taskExecutor = indexSearcher.getTaskExecutor(); List<LeafReaderContext> leafReaderContexts = reader.leaves(); + List<Callable<TopDocs>> tasks = new ArrayList<>(leafReaderContexts.size()); - for (LeafReaderContext context : leafReaderContexts) { - tasks.add(() -> searchLeaf(context, filterWeight, knnCollectorManager)); + + TopDocs[] perLeafResults; + if (leafReaderContexts.size() > 1) { + if (true) { + /* sort LRCs by segment size */ + List<LeafReaderContext> sortedLeafReaderContexts = leafReaderContexts.stream() + .sorted(Comparator.comparingInt(o -> o.reader().numDocs())).toList(); + int noLRCs = sortedLeafReaderContexts.size(); Review Comment: I think this is OK for a POC, but we can easily sort actual number of vectors for real data. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org