akshayakp97 opened a new issue, #9268:
URL: https://github.com/apache/iceberg/issues/9268

   ### Query engine
   
   Query Engine: Spark 3.5.0
   Apache Iceberg: 1.4.2
   
   ### Question
   
   Hi, 
   
   My understanding is that Spark Optimizer can add new `Project` operator even 
after V2 Relation was created. For example, it looks like `ColumnPruning` 
optimizer rule triggers after `V2ScanRelationPushDown` 
[here](https://github.com/apache/spark/blob/bacdb3b5fec9783f46042764eeee80eb2a0f5702/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L207-L240).
   
   If that's the case, then it would be expected that the columns projected by 
the newly added `Project` operator would prune the schema like how 
[`V2ScanRelationPushDown#pruneColumns`](https://github.com/apache/spark/blob/bacdb3b5fec9783f46042764eeee80eb2a0f5702/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala#L50)
 does. I don't see any schema pruning happening after `V2ScanRelationPushDown` 
for DatasourceV2. However, for DatasourceV1, I can see schema pruning happening 
in 
[`FileSourceStrategy#apply`](https://github.com/apache/spark/blob/bacdb3b5fec9783f46042764eeee80eb2a0f5702/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala#L283)
 method before `FileSourceScanExec` physical node is created. 
   
   I don't see a similar logic in `DataSourceV2Strategy` to prune the 
relation's schema with the latest `Attribute`'s from `Project`'s and `Filter`'s 
before `BatchScanExec` is 
[created](https://github.com/apache/spark/blob/bacdb3b5fec9783f46042764eeee80eb2a0f5702/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala#L137-L148).
 
   
   Is there a known gap with `DataSourceV2`? 
   
   Thanks in advance!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to