ForeverAngry commented on PR #2205: URL: https://github.com/apache/iceberg-python/pull/2205#issuecomment-3322452187
> thanks for your patience with the reviews! > > im +1 to what jayce mentioned, i think we'd want to use the existing table retry mechanisms instead of adding retry to a specific user-facing table function. Yeah, @jayceslesar i think using the table properties to set or seed the retry arguments is fine. That being said, i dont think that addresses the issue. The problem here is a bit different, at least from my perspective. I do think retry logic for **this** function, alone, would actually make sense. The `add_files` method can be used to commit large batches of files. The amount of overhead that goes into collecting that metadata from all of the files in said batch, just to get the point of trying to commit, is what the PR seeks to address. There isnt a good or clean way to pull that metadata prior to calling the `add_files` method to store it _(besides using pyarrow directly),_ and if there were, i dont think there is a public api for `add_datafiles` in pyiceberg _(though i know the function is one of the private methods actually used in one of the functions in the call stack of the `add_files` function)._ From my perspective, the way an `append` is used, in common data pipeline, is likely much different in scope and magnitude, as compared to what the `add_files` function could / typically sees. @kevinjqliu does this context help, and was it persuasive, or is there maybe something im not seeing? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
