Re: [PR] Exclude reading pos_ column if it's not in the scan list [iceberg]

via GitHub Fri, 25 Oct 2024 15:27:52 -0700


huaxingao commented on PR #11390:
URL: https://github.com/apache/iceberg/pull/11390#issuecomment-2438965356


   @pvary Thank you for your suggestion! You're correct that adding such a test 
would help prevent future changes from inadvertently affecting this behavior 
without notice. Currently, Spark doesn't check the schema when processing batch 
data, which is why an extra Arrow vector in `ColumnarBatch` doesn't cause 
error. However, Comet allocates arrays in a pre-allocated list and relies on 
the requested schema to determine how many columns are in the batch. If extra 
columns are returned to Comet, it will fail. While we currently don't have a 
test that fails due to extra columns, the integration of Comet will change 
this. Once Comet is integrated, the tests involving the Comet reader will fail 
if extra columns are present. I believe these Comet reader tests will serve as 
the tests you've suggested we add.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Exclude reading pos_ column if it's not in the scan list [iceberg]

Reply via email to