kevinjqliu commented on issue #1032: URL: https://github.com/apache/iceberg-python/issues/1032#issuecomment-2278578863
okay, this doesn't look like an issue with reading many metadata files. I wonder if the `limit` is respected for table scans. Things I want to compare * reading raw parquet file with pyarrow * reading entire iceberg table, without limits * reading iceberg table, with limit of 1 * reading iceberg table with duckdb * reading iceberg table with duckdb, with limit of 1 I think this will give us some insights about read performance in pyiceberg For reading raw parquet files, you can do something like this, ``` import pyarrow.parquet as pq import time parquet_file_path = "" start_time = time.time() table = pq.read_table(parquet_file_path) end_time = time.time() time_taken = end_time - start_time print(f"Time taken to read the Parquet file: {time_taken} seconds") ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org