akshayakp97 opened a new issue, #9268: URL: https://github.com/apache/iceberg/issues/9268
### Query engine Query Engine: Spark 3.5.0 Apache Iceberg: 1.4.2 ### Question Hi, My understanding is that Spark Optimizer can add new `Project` operator even after V2 Relation was created. For example, it looks like `ColumnPruning` optimizer rule triggers after `V2ScanRelationPushDown` [here](https://github.com/apache/spark/blob/bacdb3b5fec9783f46042764eeee80eb2a0f5702/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L207-L240). If that's the case, then it would be expected that the columns projected by the newly added `Project` operator would prune the schema like how [`V2ScanRelationPushDown#pruneColumns`](https://github.com/apache/spark/blob/bacdb3b5fec9783f46042764eeee80eb2a0f5702/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala#L50) does. I don't see any schema pruning happening after `V2ScanRelationPushDown` for DatasourceV2. However, for DatasourceV1, I can see schema pruning happening in [`FileSourceStrategy#apply`](https://github.com/apache/spark/blob/bacdb3b5fec9783f46042764eeee80eb2a0f5702/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala#L283) method before `FileSourceScanExec` physical node is created. I don't see a similar logic in `DataSourceV2Strategy` to prune the relation's schema with the latest `Attribute`'s from `Project`'s and `Filter`'s before `BatchScanExec` is [created](https://github.com/apache/spark/blob/bacdb3b5fec9783f46042764eeee80eb2a0f5702/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala#L137-L148). Is there a known gap with `DataSourceV2`? Thanks in advance! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org