amogh-jahagirdar commented on issue #9172: URL: https://github.com/apache/iceberg/issues/9172#issuecomment-1848220358
Thanks for the details, one key thing stands out to me: ``` I also tested with latest version, iceberg-spark-runtime-3.4_2.12-1.4.2.jar as well, I could see that the second number, part of the file name, is continuously increasing 00001-3200-11773075-523f-4667-936b-88702fe9860c-00001.parquet, however after around 200 execution of stream, the file name got reset 00001-3166-11773075-523f-4667-936b-88702fe9860c-00001.parquet and files were started getting overwritten. ``` This does align with the suspicion in the other issue that task IDs can be reused across epochs ("after around 200 executions of stream" I'm reading that as 200 intervals of miccrobatches) Which I think makes sense (and anyways that's probably intentional in the DSV2 API to surface the writer). I'll put up a draft for adding the epochID to the output path. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org