Tim-Brooks commented on code in PR #15990:
URL: https://github.com/apache/lucene/pull/15990#discussion_r3185899482


##########
lucene/core/src/java/org/apache/lucene/index/IndexingChain.java:
##########
@@ -681,6 +693,435 @@ private void oversizeDocFields() {
     docFields = newDocFields;
   }
 
+  /**
+   * Process a column-oriented batch of documents. Iterates the batch's 
columns, validates each
+   * column's field type, and feeds values to the appropriate DocValuesWriter.
+   *
+   * @param baseDocID the segment-level doc ID for the first document in the 
batch (batch-local doc
+   *     0 maps to this value)
+   * @param columnBatch the column-oriented batch
+   */
+  void processBatch(int baseDocID, ColumnBatch columnBatch) throws IOException 
{
+    final int numDocs = columnBatch.numDocs();
+    boolean hasRowColumns = false;
+
+    // First pass: validate all column schemas and initialize field infos
+    for (Column column : columnBatch.columns()) {
+      final String fieldName = column.name();
+      final IndexableFieldType fieldType = column.fieldType();
+
+      ColumnValidation.validateColumnHasIndexingFeature(fieldName, fieldType);
+
+      if (column instanceof BinaryColumn bc) {
+        ColumnValidation.validateBinaryColumn(bc, fieldType);
+      } else if (column instanceof LongColumn lc) {
+        ColumnValidation.validateLongColumn(lc, fieldType);
+      } else if (column instanceof VectorColumn<?> vc) {
+        ColumnValidation.validateVectorColumn(vc, fieldType);
+      }

Review Comment:
   I do agree that this is not something I will address here as I think 
embedded schemas into Lucene would probably be a discussion of its own.
   
   My use case does demonstrate significant overhead for field validation using 
the document API. I was doing something like 150 fields per document and the 
time spend in processDocument was greater than the actual indexing. 
   
   Moving towards batches of 1000 with 150 columns made all the validation 
disappear from any sample-able threads. Doesn't mean it is not there though.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to