msokolov commented on a change in pull request #1930: URL: https://github.com/apache/lucene-solr/pull/1930#discussion_r506955463
########## File path: lucene/core/src/java/org/apache/lucene/index/VectorValues.java ########## @@ -0,0 +1,273 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.index; + +import java.io.IOException; + +import org.apache.lucene.search.DocIdSetIterator; +import org.apache.lucene.search.TopDocs; +import org.apache.lucene.util.BytesRef; + +/** + * This class provides access to per-document floating point vector values indexed as {@link + * org.apache.lucene.document.VectorField}. + */ +public abstract class VectorValues extends DocIdSetIterator { + + /** The maximum length of a vector */ + public static int MAX_DIMENSIONS = 1024; + + /** Sole constructor */ + protected VectorValues() {} + + /** + * Return the dimension of the vectors + */ + public abstract int dimension(); + + /** + * TODO: should we use cost() for this? We rely on its always being exactly the number + * of documents having a value for this field, which is not guaranteed by the cost() contract, + * but in all the implementations so far they are the same. + * @return the number of vectors returned by this iterator + */ + public abstract int size(); + + /** + * Return the score function used to compare these vectors + */ + public abstract ScoreFunction scoreFunction(); + + /** + * Return the vector value for the current document ID. + * It is illegal to call this method when the iterator is not positioned: before advancing, or after failing to advance. + * The returned array may be shared across calls, re-used, and modified as the iterator advances. + * @return the vector value + */ + public abstract float[] vectorValue() throws IOException; Review comment: Thinking about this some more: why do we have the random access interface? It enables a vector consumer to build efficient data structures using vector ordinals as keys, allowing them to avoid an extra step of mapping from docid to vector ordinal and back. It's expected that such a consumer will maintain their own such mapping - the point is that we don't want to force them to constantly be mapping (by eg defining this API in terms of docids). And we don't want to provide a less-than-ideal mapping function, or spend extra effort to maintain a mapping that may not be used. We could try hiding this behind a more opaque class structure, but given that this interface needs to be accessed in o.a.l.index and o.a.l.codecs I think it would have to be public, and I'm not sure what the benefit would be. I think we can always *add* `ordinal(int docId)` and `docId(int ordinal)` methods as they become useful? I think that would be a good API, and nothing here prevents it from being added later, so I propose to commit this as is. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org