huaxingao commented on code in PR #9841: URL: https://github.com/apache/iceberg/pull/9841#discussion_r1899171737
########## spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/ColumnVectorBuilder.java: ########## @@ -46,8 +46,8 @@ public ColumnVector build(VectorHolder holder, int numRows) { } else { throw new IllegalStateException("Unknown dummy vector holder: " + holder); } - } else if (rowIdMapping != null) { - return new ColumnVectorWithFilter(holder, rowIdMapping); + } else if (withDelete) { Review Comment: The new approach is to load all data vectors first, and then apply delete logic to the ColumnarBatch by mutating all the ColumnVectors in place via setRowIdMapping. Initially, all the rowIdMapping are null, and later, the rowIdMapping will be computed and set on all the ColumnVectors. Since the rowIdMapping are set later, at this point, the rowIdMapping is null, we can't use `rowIdMapping != null` to decide to whether to construct a ColumnVectorWithFilter or regular `IcebergArrowColumnVector`. Instead, `withDelete` (which is computed using `deletes != null` in ColumnarBatchReader.readDataToColumnVectors`) is used. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org