fengguangyuan commented on issue #9736: URL: https://github.com/apache/iceberg/issues/9736#issuecomment-1956058615
>The tmp_data is literally the same data of warehouse.data, when running this code I would expected no changes in the dataset because it didn't match anything. However my parquet files are being duplicated. The data file names imply that these files may be produced by the same Spark Task instance, but I can't make sure the real cause because of lacking logs. I suggest you to do these checks: 1. to check these data file status in the historical snapshot. 2. to check if task 19 failed and retried. 3. to confirm if these data files were generated by the same action. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org