akshayakp97 commented on issue #9268:
URL: https://github.com/apache/iceberg/issues/9268#issuecomment-1850779601

   Thanks for your response. 
   
   I am looking at TPCDS q16 physical plan for Iceberg on EMR. 
   
   Link to q16 - 
https://github.com/apache/spark/blob/a78d6ce376edf2a8836e01f47b9dff5371058d4c/sql/core/src/test/resources/tpcds/q16.sql
   
   The physical plan looks like - 
https://gist.github.com/akshayakp97/102715c66eee44bc6f72493f427528f8
   
   Line 46 projects only two columns from `Project [cs_warehouse_sk#54840, 
cs_order_number#54843L]`, however it looks like Iceberg is scanning all columns 
for  the `catalog_sales` table in Line 47. 
   
   Upon further digging, I found out that `ColumnPruning` 
[rule](https://github.com/apache/spark/blob/7db85642600b1e3b39ca11e41d4e3e0bf1c8962b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L859)
 adds the new `Project [cs_warehouse_sk#54840, cs_order_number#54843L]` 
operator, but we still see all columns read by the corresponding BatchScanExec. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to