uschindler commented on code in PR #12436: URL: https://github.com/apache/lucene/pull/12436#discussion_r1276162592
########## lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsFormat.java: ########## @@ -76,6 +79,19 @@ public static KnnVectorsFormat forName(String name) { /** Returns a {@link KnnVectorsReader} to read the vectors from the index. */ public abstract KnnVectorsReader fieldsReader(SegmentReadState state) throws IOException; + /** + * Returns the maximum number of vector dimensions supported by this codec for the given field + * name + * + * <p>Codecs should override this method to specify the maximum number of dimensions they support. + * + * @param fieldName the field name + * @return the maximum number of vector dimensions. + */ + public int getMaxDimensions(String fieldName) { Review Comment: In main branch this should be abstract, only in 9.x we should have a default impl. ########## lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseKnnVectorsFormatTestCase.java: ########## @@ -478,18 +473,42 @@ public void testIllegalDimensionTooLarge() throws Exception { try (Directory dir = newDirectory(); IndexWriter w = new IndexWriter(dir, newIndexWriterConfig())) { Document doc = new Document(); - expectThrows( - IllegalArgumentException.class, - () -> - doc.add( - new KnnFloatVectorField( - "f", - new float[FloatVectorValues.MAX_DIMENSIONS + 1], - VectorSimilarityFunction.DOT_PRODUCT))); + doc.add( + new KnnFloatVectorField( + "f", + new float[getVectorsMaxDimensions("f") + 1], + VectorSimilarityFunction.DOT_PRODUCT)); + Exception exc = expectThrows(IllegalArgumentException.class, () -> w.addDocument(doc)); Review Comment: So basically the check is now delayed to `addDocument()`? Great! I was afraid that the check comes delayed while indexing of flushing is happening. So to me this looks good. ########## lucene/core/src/java/org/apache/lucene/index/IndexingChain.java: ########## @@ -621,6 +621,12 @@ private void initializeFieldInfo(PerField pf) throws IOException { final Sort indexSort = indexWriterConfig.getIndexSort(); validateIndexSortDVType(indexSort, pf.fieldName, s.docValuesType); } + if (s.vectorDimension != 0) { + validateMaxVectorDimension( + pf.fieldName, + s.vectorDimension, + indexWriterConfig.getCodec().knnVectorsFormat().getMaxDimensions()); + } Review Comment: Yes this looks fine. The hashtable lookup is no issue at all. As getMaxDimensions() is a public codec method, we can let us do the check here. A separate validateFieldDims() is not needed. I only don't like the long call chain to actually get the vectors format from the indexWriterConfig. But I can live with that. ########## lucene/core/src/java/org/apache/lucene/codecs/perfield/PerFieldKnnVectorsFormat.java: ########## @@ -80,6 +80,11 @@ public KnnVectorsReader fieldsReader(SegmentReadState state) throws IOException return new FieldsReader(state); } + @Override Review Comment: is this the only subclass of KnnVectorsFormat that we have? In addition, we should also add explicit number of dimensions in backwards codecs, because at the time when it was implemented 1024 was their default. In backwards codec the method should be final, IMHO. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org