jimczi commented on code in PR #12529: URL: https://github.com/apache/lucene/pull/12529#discussion_r1320000915
########## lucene/core/src/java/org/apache/lucene/util/hnsw/RandomVectorScorerProvider.java: ########## @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.util.hnsw; + +import java.io.IOException; +import org.apache.lucene.index.VectorSimilarityFunction; + +/** A provider that creates {@link RandomVectorScorer} from an ordinal. */ +public interface RandomVectorScorerProvider { + /** + * This creates a {@link RandomVectorScorer} for scoring random nodes in batches against the given + * ordinal. + * + * @param ord the ordinal of the node to compare + * @return a new {@link RandomVectorScorer} + */ + RandomVectorScorer scorer(int ord) throws IOException; + + /** + * Creates a {@link RandomVectorScorerProvider} to compare float vectors. + * + * <p>WARNING: The {@link RandomAccessVectorValues} given can contain stateful buffers. Avoid + * using it after calling this function. If you plan to use it again outside the returned {@link + * RandomVectorScorer}, think about passing a copied version ({@link + * RandomAccessVectorValues#copy}). + * + * @param vectors the underlying storage for vectors + * @param similarityFunction the similarity function to score vectors + */ + static RandomVectorScorerProvider createFloats( + final RandomAccessVectorValues<float[]> vectors, + final VectorSimilarityFunction similarityFunction) + throws IOException { + final RandomAccessVectorValues<float[]> vectorsCopy = vectors.copy(); + return queryOrd -> + (RandomVectorScorer) + cand -> + similarityFunction.compare( + vectors.vectorValue(queryOrd), vectorsCopy.vectorValue(cand)); Review Comment: I'm trying limit the number of copies we make. In this model, we only make one copy for the entire supplier. We depend on the fact that when we call vectors.vectorValue twice in a row with the same order, it will use the previous value. Your suggestion would result in making a copy every time the supplier creates a scorer. Since we create a scorer every time we diversify a node, I believe this would have a noticeable impact. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org