singhpk234 commented on code in PR #11551:
URL: https://github.com/apache/iceberg/pull/11551#discussion_r1847025083


##########
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/ColumnarBatchReader.java:
##########
@@ -45,11 +45,23 @@ public class ColumnarBatchReader extends 
BaseBatchReader<ColumnarBatch> {
   private final boolean hasIsDeletedColumn;
   private DeleteFilter<InternalRow> deletes = null;
   private long rowStartPosInBatch = 0;
+  // In the case of Equality Delete, we have also built ColumnarBatchReader 
for the equality delete
+  // filter columns to read the value to find out which rows are deleted. If 
these deleted filter
+  // columns are not in the requested schema, then these are the extra columns 
that we want to
+  // remove before return the ColumnBatch to Spark.
+  // Supposed table schema is C1, C2, C3, C4, C5, The query is:
+  // SELECT C5 FROM table, and the equality delete Filter is on C3, C4,
+  // We read the values of C3, C4 to figure out which rows are deleted, but we 
don't want to include
+  // these values in the ColumnBatch that we return to Spark. In this example, 
the numOfExtraColumns
+  // is 2. Since when creating the DeleteFilter, we append these extra columns 
in the end of the
+  // requested schema, we can just remove them from the end of the 
ColumnVector.

Review Comment:
   Thank you for the response !
   
   Considering applyEquality delete anyways does another projection on top of 
the schema returned from 
DeleteFilte.[fileProjection](https://github.com/apache/iceberg/blob/06dc721498d6ad95c86f0f884b8ad30f807ef321/data/src/main/java/org/apache/iceberg/data/DeleteFilter.java#L306)
   
   
https://github.com/apache/iceberg/blob/bf8d25fe1578ef199d64fb609c0299728ec58910/data/src/main/java/org/apache/iceberg/data/DeleteFilter.java#L196
   
   can we not add one another param in the fileProjection like we did here to 
include additional fiels based on the boolean flag ? : 
   
https://github.com/apache/iceberg/blob/06dc721498d6ad95c86f0f884b8ad30f807ef321/data/src/main/java/org/apache/iceberg/data/DeleteFilter.java#L272
 
   
   so that we get what columns we actually need in the first place ? to avoid 
removing extra columns post filter evaluation 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to