sungwy commented on issue #1806:
URL: 
https://github.com/apache/iceberg-python/issues/1806#issuecomment-2735096521

   Hi @HungYangChang - thanks for posting the logs!
   
   
   A couple of things to unpack here: `_dataframe_to_data_files` produces an 
`Iterator`, which means that the task of actually writing the parquet files 
isn't done when you log the 0.000 seconds output.
   
   
https://github.com/apache/iceberg-python/blob/a294257e6dfe6298640d377e2c96a40400c38950/pyiceberg/io/pyarrow.py#L2563-L2569
   
   Instead, it writes the parquet files when the iterator's elements are 
appended to `append_files`, which in your logs is at this point in time:
   
   ```
   [2025-03-18T18:35:20.342Z] append_data_file 0.838 seconds
   ```
   
   From my observation of the logs, your commit does seem to be taking:
   ```
   [2025-03-18T18:35:22.413Z] Table append operation took 2.950 seconds
   ```
   
   Do you have access to the Lakekeeper logs that gives your information on how 
long it take for the Rest Catalog to process the commit request? Once it 
accepts the commit request, the Rest Catalog must write the metadata on its end 
and then return an HTTP response back to PyIceberg. It would be good to compare 
this number against the request->response wall time Lakekeeper is reporting for 
your specific commit request


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to