huaxingao commented on PR #11390: URL: https://github.com/apache/iceberg/pull/11390#issuecomment-2438965356
@pvary Thank you for your suggestion! You're correct that adding such a test would help prevent future changes from inadvertently affecting this behavior without notice. Currently, Spark doesn't check the schema when processing batch data, which is why an extra Arrow vector in `ColumnarBatch` doesn't cause error. However, Comet allocates arrays in a pre-allocated list and relies on the requested schema to determine how many columns are in the batch. If extra columns are returned to Comet, it will fail. While we currently don't have a test that fails due to extra columns, the integration of Comet will change this. Once Comet is integrated, the tests involving the Comet reader will fail if extra columns are present. I believe these Comet reader tests will serve as the tests you've suggested we add. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org