Re: [PR] feat: Read Parquet data file with projection [iceberg-rust]

via GitHub Wed, 20 Mar 2024 01:06:13 -0700


liurenjie1024 commented on PR #245:
URL: https://github.com/apache/iceberg-rust/pull/245#issuecomment-2008989780


   > @liurenjie1024 Thanks for providing some references to #251, #252.
   > 
   > I took at the Python reading projection in 
https://github.com/apache/iceberg-python/blob/6c8ea0effac0942ad4e880e5eef627473a354040/pyiceberg/io/pyarrow.py#L939.
 I'm wondering if we actually need #251 and #252 for pruning column here.
   > 
   > For the arrow Parquet reader, it only requires us to identify the columns 
to read through `ProjectionMask`. It can be obtained by using field ids from 
the selected columns from `TableScan`.
   > 
   > In the Python implementation, it requires #251 because it calls the 
scanner API that needs the pruned schema. For us, I don't see where we need the 
pruned schema.
   > 
   > I updated how to leverage `ProjectionMask` using field ids and fixed 
previous approach which doesn't look correct. Please take a look again. Thanks.
   
   Cool, I'll take a look later. Maybe java's version is similar to this one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: Read Parquet data file with projection [iceberg-rust]

Reply via email to