amogh-jahagirdar commented on issue #14043:
URL: https://github.com/apache/iceberg/issues/14043#issuecomment-3315144693

   I poked into this a bit more, I take back what I said about the projection 
being produced being incorrect. I do think it's expected that after 
`PruneColumns` is invoked it only returns the `id` column for this file. The 
[model created 
](https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/SparkParquetReaders.java#L79)
 for reading into Spark's internal row for this case where it's a map<some_key, 
some_value_struct> and a field which does not exist in the file is projected 
   
   I think the real issue is a mismatch between the model created in Spark for 
this case and the internal page readers. The internal page readers, will just 
project the id column based on pruning for that particular file. The Spark 
model produces a InternalRowReader for the ID column and then a map reader, 
where the underlying struct field reader in the value of the map is a null 
reader. Then when setting the page source to the model, the page reader for the 
ID expectedly cannot be set on the reader trying to read the whole map.  I 
think if a nested struct field in a map is being projected that does not exist 
in the file, we should ideally create the null or default value reader, instead 
of creating the whole map reader, but I need to check this further.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to