RussellSpitzer commented on PR #13938:
URL: https://github.com/apache/iceberg/pull/13938#issuecomment-3234519592

   Ah I think the issue is that in our code in the library we assume that the 
Parquet Reader already has a project which only selects those columns which 
need to be read prior to opening the file. We have to do this anyway because we 
have to map the names in the schema to the names in the file based on field 
id's. 
   
   
https://github.com/apache/iceberg/blob/cf74b65230f7275654221bf87eb92c1c78248cdc/arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorizedReaderBuilder.java#L171-L174
   
   It seems like we aren't doing a similar thing with the Arrow reader?
   
   Is that on track? I'm trying to figure this out but I think ideally we just 
don't try to read null vectors at all at a higher level?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to