huaxingao commented on PR #11390: URL: https://github.com/apache/iceberg/pull/11390#issuecomment-2451241576
@szehon-ho Thanks for the comment. We actually also use the [requiredSchema](https://github.com/apache/iceberg/blob/fda2b3a5706fd580b0371e8a7c4b31d536eac0a3/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/BaseBatchReader.java#L90), that's the schema with the `_pos` column. In ReadConf#[generateOffsetToStartPos](https://github.com/apache/iceberg/blob/main/parquet/src/main/java/org/apache/iceberg/parquet/ReadConf.java#L185), we actually need to know if pos delete exists. We can pass in a flag to `SparkDeleteFilter` to not add the `_pos` column, but then I think we need to add another flag to pass the hasPosDelete info to Parquet `ReaderBuilder`, and then pass to `ReadConfig`. ORC uses [expectedSchema()](https://github.com/apache/iceberg/blob/fda2b3a5706fd580b0371e8a7c4b31d536eac0a3/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/BaseBatchReader.java#L125), the schema without _pos column, to build vectorized readers. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org