benwtrent commented on code in PR #12962: URL: https://github.com/apache/lucene/pull/12962#discussion_r1475057976
########## lucene/core/src/java/org/apache/lucene/search/knn/KnnCollectorManager.java: ########## @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.search.knn; + +import java.io.IOException; +import org.apache.lucene.search.KnnCollector; +import org.apache.lucene.util.BitSet; + +/** + * KnnCollectorManager responsible for creating {@link KnnCollector} instances. Useful to create + * {@link KnnCollector} instances that share global state across leaves, such a global queue of + * results collected so far. + */ +public abstract class KnnCollectorManager<C extends KnnCollector> { Review Comment: Do we need these generics? This also seems like it should be an `interface` ########## lucene/join/src/java/org/apache/lucene/search/join/DiversifyingChildrenByteKnnVectorQuery.java: ########## @@ -123,7 +124,16 @@ protected TopDocs exactSearch(LeafReaderContext context, DocIdSetIterator accept } @Override - protected TopDocs approximateSearch(LeafReaderContext context, Bits acceptDocs, int visitedLimit) + protected KnnCollectorManager<?> getKnnCollectorManager(int k, boolean supportsConcurrency) { + return new DiversifyingNearestChildrenKnnCollectorManager(k); Review Comment: If we adjust the interface, this manager could know about `BitSetProducer parentsFilter;` and abstract that away from this query. ########## lucene/core/src/java/org/apache/lucene/index/LeafReader.java: ########## @@ -277,27 +273,24 @@ public final TopDocs searchNearestVectors( * * @param field the vector field to search * @param target the vector-valued query - * @param k the number of docs to return * @param acceptDocs {@link Bits} that represents the allowed documents to match, or {@code null} * if they are all allowed to match. - * @param visitedLimit the maximum number of nodes that the search is allowed to visit + * @param knnCollector collector with settings for gathering the vector results. * @return the k nearest neighbor documents, along with their (searchStrategy-specific) scores. * @lucene.experimental */ public final TopDocs searchNearestVectors( - String field, byte[] target, int k, Bits acceptDocs, int visitedLimit) throws IOException { Review Comment: same here, this shouldn't be mutated at all. ########## lucene/core/src/java/org/apache/lucene/search/AbstractKnnCollector.java: ########## @@ -23,7 +23,7 @@ */ public abstract class AbstractKnnCollector implements KnnCollector { - private long visitedCount; + long visitedCount; Review Comment: I think this should be protected, not package private. Only sub-classes should be able to read it. ########## lucene/core/src/java/org/apache/lucene/search/TopKnnCollector.java: ########## @@ -27,25 +29,78 @@ */ public final class TopKnnCollector extends AbstractKnnCollector { + // greediness of globally non-competitive search: (0,1] + private static final float DEFAULT_GREEDINESS = 0.9f; + // the local queue of the results with the highest similarities collected so far in the current + // segment Review Comment: I think this should be a separate collector. Something like `MultiLeafTopKnnCollector`. There is such very little code from the original collector still around, it seems weird to me. We should have two, one that shares information, another that doesn't. This allows us to remove all the `null` values in the ctor. ########## lucene/core/src/java/org/apache/lucene/search/knn/KnnCollectorManager.java: ########## @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.search.knn; + +import java.io.IOException; +import org.apache.lucene.search.KnnCollector; +import org.apache.lucene.util.BitSet; + +/** + * KnnCollectorManager responsible for creating {@link KnnCollector} instances. Useful to create + * {@link KnnCollector} instances that share global state across leaves, such a global queue of + * results collected so far. + */ +public abstract class KnnCollectorManager<C extends KnnCollector> { + + /** + * Return a new {@link KnnCollector} instance. + * + * @param visitedLimit the maximum number of nodes that the search is allowed to visit + * @param parentBitSet the parent bitset, {@code null} if not applicable + */ + public abstract C newCollector(int visitedLimit, BitSet parentBitSet) throws IOException; Review Comment: ```suggestion public abstract C newCollector(int visitedLimit, LeafReaderContext context) throws IOException; ``` Also, I am not even sure `visitedLimit` should be there. It seems like something the manager should already know about (as in this instance its static) and we just need to know about the context (the context is for `DiversifyingChildrenFloatKnnVectorQuery` so that its collector manager can create `BitSet parentBitSet` from its encapsulated `BitSetProducer`). I also think this method could return `null` if collection is not applicable for that given leaf context. ########## lucene/core/src/java/org/apache/lucene/index/LeafReader.java: ########## @@ -236,27 +235,24 @@ public final PostingsEnum postings(Term term) throws IOException { * * @param field the vector field to search * @param target the vector-valued query - * @param k the number of docs to return * @param acceptDocs {@link Bits} that represents the allowed documents to match, or {@code null} * if they are all allowed to match. - * @param visitedLimit the maximum number of nodes that the search is allowed to visit + * @param knnCollector collector with settings for gathering the vector results. * @return the k nearest neighbor documents, along with their (searchStrategy-specific) scores. * @lucene.experimental */ public final TopDocs searchNearestVectors( - String field, float[] target, int k, Bits acceptDocs, int visitedLimit) throws IOException { + String field, float[] target, Bits acceptDocs, KnnCollector knnCollector) throws IOException { Review Comment: I agree with Jim, this should be `String field, float[] target, int k, Bits acceptDocs, int visitedLimit` at least for this PR, and not use a queue. ########## lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java: ########## @@ -79,24 +83,32 @@ public Query rewrite(IndexSearcher indexSearcher) throws IOException { filterWeight = null; } + final boolean supportsConcurrency = indexSearcher.getSlices().length > 1; Review Comment: > because we see speedups even in sequential run Do you mean speed ups without concurrency via sharing information? That is interesting, I wonder why that is. ########## lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java: ########## @@ -79,24 +83,32 @@ public Query rewrite(IndexSearcher indexSearcher) throws IOException { filterWeight = null; } + final boolean supportsConcurrency = indexSearcher.getSlices().length > 1; + KnnCollectorManager<?> knnCollectorManager = getKnnCollectorManager(k, supportsConcurrency); Review Comment: I think this interface should accept the `indexSearcher` as the parameter and not `supportsConcurrency` or `multipleLEaves` This way it can build whatever internal state it needs, this is particularly useful for `DiversifyingChildrenFloatKnnVectorQuery` etc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org