kevinjqliu commented on issue #1032: URL: https://github.com/apache/iceberg-python/issues/1032#issuecomment-2285046586
> I have one more question regarding the read_parquet from awswrangler. Do you know why it's faster than the other methods? Is there any optimization on their end or something? I was also surprised by the performance difference. It's hard for me to say unless I look into the implementation details (in [awswrangler/s3/_read_parquet.py](https://github.com/aws/aws-sdk-pandas/blob/8d0c071649fb9e603a2ab2846307f902fafeabf5/awswrangler/s3/_read_parquet.py#L318)). There's definitely room for optimizations on the PyIceberg side. If you look at another engine like [daft](https://www.getdaft.io/projects/docs/en/latest/user_guide/basic_concepts/read-and-write.html#from-files), which is optimized for reading parquet on S3, that's a good target for potential performance gains. On the PyIceberg side, there's a future opportunity to integrate with iceberg-rust, which might speed up reading files. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org