Re: [PR] feat: Read Parquet data file with projection [iceberg-rust]

via GitHub Tue, 19 Mar 2024 18:56:00 -0700


viirya commented on PR #245:
URL: https://github.com/apache/iceberg-rust/pull/245#issuecomment-2008556920


   @liurenjie1024 Thanks for providing some references to #251, #252.
   
   I took at the Python reading projection in 
https://github.com/apache/iceberg-python/blob/6c8ea0effac0942ad4e880e5eef627473a354040/pyiceberg/io/pyarrow.py#L939.
 I'm wondering if we actually need #251 and #252 for pruning column here.
   
   For the arrow Parquet reader, it only requires us to identify the columns to 
read through `ProjectionMask`. It can be obtained by using field ids from the 
selected columns from `TableScan`. 
   
   In the Python implementation, it requires #251 because it calls the scanner 
API that needs the pruned schema. For us, I don't see where we need the pruned 
schema.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: Read Parquet data file with projection [iceberg-rust]

Reply via email to