Re: [I] Duplicate file name in Iceberg's metadata [iceberg]

via GitHub Thu, 07 Dec 2023 22:44:42 -0800


amogh-jahagirdar commented on issue #8953:
URL: https://github.com/apache/iceberg/issues/8953#issuecomment-1846628756


   Ok I actually looked at the history of these changes now 
https://github.com/apache/iceberg/pull/5214 was never merged but followed by 
https://github.com/apache/iceberg/pull/6569/files which actually applied the 
change and would've been released in 1.2.0.
   
   The goal for including the query ID looks to be to identify which spark job 
actually performed the write; previously there would've been a new UUID per 
write, and we would've avoided files stepping on each other. 
   
   Let me try and get a reproducible example,  (we would want one anyways for 
verifying whatever fix we do actually works) ideally we can get the best of 
both worlds. I think to do that some combination of the query ID + the hostname 
+ the thread ID would be truly unique and enable better debugging (at the cost 
of a really long filename :) ).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Duplicate file name in Iceberg's metadata [iceberg]

Reply via email to