kevinjqliu commented on issue #1032: URL: https://github.com/apache/iceberg-python/issues/1032#issuecomment-2285052713
> you're benchmarking the fsspec FileIO path in pyiceberg, which if I understand correctly is using fsspec s3fs directly with a lot of defaults. Probably it keeps the default block size (5mb). There are two FileIO implementations, [fsspec](https://github.com/apache/iceberg-python/blob/f05b1aedee8451d981188adf68be5e8b360a9ca1/pyiceberg/io/fsspec.py#L250) and [pyarrow](https://github.com/apache/iceberg-python/blob/f05b1aedee8451d981188adf68be5e8b360a9ca1/pyiceberg/io/pyarrow.py). In the case above, I believe pyarrow is used, since its preferred over fsspec ([source](https://github.com/apache/iceberg-python/blob/f05b1aedee8451d981188adf68be5e8b360a9ca1/pyiceberg/io/__init__.py#L289-L291)) Looks like the pyarrow default buffer size is 1MB https://github.com/apache/iceberg-python/blob/f05b1aedee8451d981188adf68be5e8b360a9ca1/pyiceberg/io/pyarrow.py#L417 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org