Re: [I] Duplicate file name in Iceberg's metadata [iceberg]

via GitHub Wed, 29 Nov 2023 00:48:53 -0800


github-raphael-douyere commented on issue #8953:
URL: https://github.com/apache/iceberg/issues/8953#issuecomment-1831463021


   @Fokko  I don't know how to have a simple and reproductible setup. We had 
the issue at a rate of ~10 files per week with an app producing hundreds of 
files per hour. 
   
   @amogh-jahagirdar  And yes I know that the file name is not only the query 
id. But I think the other elements can definitively repeat (`taskId` and 
`partitionId`). What I'm not sure of is the `fileCount` part. I think it is 
kept in memory but resets when the app is restarted (ie: not part of the 
state). So my point is: with a UUID this can't happen (barring the UUID 
collision) as whatever collisions on the other part of the filename are handled 
by a uniq part. 
   Another fix could be to keep the `operationId` but add an UUID as well. This 
would extend the file names a little bit but is probably fine to avoid data 
loss issues.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Duplicate file name in Iceberg's metadata [iceberg]

Reply via email to