akshayakp97 commented on issue #9268: URL: https://github.com/apache/iceberg/issues/9268#issuecomment-1850779601
Thanks for your response. I am looking at TPCDS q16 physical plan for Iceberg on EMR. Link to q16 - https://github.com/apache/spark/blob/a78d6ce376edf2a8836e01f47b9dff5371058d4c/sql/core/src/test/resources/tpcds/q16.sql The physical plan looks like - https://gist.github.com/akshayakp97/102715c66eee44bc6f72493f427528f8 Line 46 projects only two columns from `Project [cs_warehouse_sk#54840, cs_order_number#54843L]`, however it looks like Iceberg is scanning all columns for the `catalog_sales` table in Line 47. Upon further digging, I found out that `ColumnPruning` [rule](https://github.com/apache/spark/blob/7db85642600b1e3b39ca11e41d4e3e0bf1c8962b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L859) adds the new `Project [cs_warehouse_sk#54840, cs_order_number#54843L]` operator, but we still see all columns read by the corresponding BatchScanExec. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org