sungwy commented on issue #1806: URL: https://github.com/apache/iceberg-python/issues/1806#issuecomment-2735096521
Hi @HungYangChang - thanks for posting the logs! A couple of things to unpack here: `_dataframe_to_data_files` produces an `Iterator`, which means that the task of actually writing the parquet files isn't done when you log the 0.000 seconds output. https://github.com/apache/iceberg-python/blob/a294257e6dfe6298640d377e2c96a40400c38950/pyiceberg/io/pyarrow.py#L2563-L2569 Instead, it writes the parquet files when the iterator's elements are appended to `append_files`, which in your logs is at this point in time: ``` [2025-03-18T18:35:20.342Z] append_data_file 0.838 seconds ``` From my observation of the logs, your commit does seem to be taking: ``` [2025-03-18T18:35:22.413Z] Table append operation took 2.950 seconds ``` Do you have access to the Lakekeeper logs that gives your information on how long it take for the Rest Catalog to process the commit request? Once it accepts the commit request, the Rest Catalog must write the metadata on its end and then return an HTTP response back to PyIceberg. It would be good to compare this number against the request->response wall time Lakekeeper is reporting for your specific commit request -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org