fengguangyuan commented on issue #9736:
URL: https://github.com/apache/iceberg/issues/9736#issuecomment-1956058615

   >The tmp_data is literally the same data of warehouse.data, when running 
this code I would expected no changes in the dataset because it didn't match 
anything. However my parquet files are being duplicated.
   
   The data file names imply that these files may be produced by the same Spark 
Task instance, but I can't make sure the real cause because of lacking logs.
   
   I suggest you to do these checks:
   1. to check these data file status in the historical snapshot.
   2. to check if task 19 failed and retried.
   3. to confirm if these data files were generated by the same action.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to