[PR] Spark: Remove extra columns for ColumnBatch [iceberg]

via GitHub Thu, 14 Nov 2024 09:07:46 -0800


huaxingao opened a new pull request, #11551:
URL: https://github.com/apache/iceberg/pull/11551


   In Equality Delete, we build `ColumnarBatchReader` for the equality delete 
filter columns to read their values and determine which rows are deleted. If 
these filter columns are not among the requested columns, they are considered 
extra and should be removed before returning the `ColumnBatch` to Spark.
   
   Suppose the table schema includes C1, C2, C3, C4, C5. If the query is: 
`SELECT C5 FROM table`, and the equality delete filter is on C3 and C4,
   
   We read the values of C3 and C4 to identify which rows are deleted. However, 
we do not want to include these values in the `ColumnBatch` that we return to 
Spark.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Spark: Remove extra columns for ColumnBatch [iceberg]

Reply via email to