jkleinkauff opened a new issue, #1032: URL: https://github.com/apache/iceberg-python/issues/1032
### Question Hey, thanks for this very convenient library. This is not a bug, just want to better understand something. I have a question regarding the performance - ie time to query the table (?) - for such methods. ```python if __name__ == "__main__": catalog = SqlCatalog( "default", **{ "uri": f"postgresql+psycopg2://postgres:Password1@localhost/postgres", }, ) table = catalog.load_table("bronze.curitiba_starts_june") df = table.scan(limit=100) pa_table = df.to_arrow() ```` The code above will run ok. My question is regarding the last command, to_arrow() transformation takes around 50s (+-) to execute. I believe this is mostly because of the network itself? The execution time will stay roughly the same with different row limit (10, 100, 1000). Querying the same table in motherduck - using iceberg_scan - is faster: <img width="836" alt="image" src="https://github.com/user-attachments/assets/21a05d45-ebcd-4323-ba31-2689d2d12fe7"> When running the same query locally - without motherduck - the execution time will be similar to what pyiceberg takes, actually it will be a little bit slower. That's why I think this is mostly like a network "issue". Can you help be understand what's happening? Thank you! #### Table Data The table has two parquet files (110mb, 127mb) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org