leeoren opened a new pull request, #14138: URL: https://github.com/apache/iceberg/pull/14138
Summary of Proposed Changes 1. Improve readability of DeltaWrite in Spark event logs Currently, when event logs contain the DeltaWrite action, the plan description is printed as: ``` (1) WriteDelta Input [1]: [_col#1] Arguments: org.apache.iceberg.spark.source.SparkPositionDeltaWrite@5234f6c5 ``` By comparison, ReplaceData (implemented in [SparkWrite](https://github.com/apache/iceberg/blob/8353ac8f80799495cfdc32dd37222ed1b8d8070f/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java#L263-L266)) is rendered more informatively: ``` (1) ReplaceData Input [1]: [col#1] Arguments: IcebergWrite(table=iceberg_table, format=PARQUET) ``` This change introduces a toString implementation for SparkPositionDeltaWrite to produce more descriptive and user-friendly plan output. 2. Expose IcebergScan details in physical plans In Spark event logs, IcebergScan does not currently appear in the physical plan description. Instead, only the generic BatchScan is shown: ``` == Physical Plan == AppendData ... +- *(1) ColumnarToRow +- BatchScan glue_catalog.namespace.table_name[#col1] glue_catalog.namespace.table_name (branch=null) [filters=, groupedBy=] RuntimeFilters: [] ``` This change adds a description method to SparkBatchQueryScan, allowing query plans to include Iceberg-specific scan information and making event logs more informative. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
