amogh-jahagirdar commented on PR #10784:
URL: https://github.com/apache/iceberg/pull/10784#issuecomment-2316733589

   > Makes sense @amogh-jahagirdar !
   > 
   > > @szehon-ho I actually had a question on the snapshot repair, based on 
the description the goal of that is to repair snapshot summary stats which may 
have been corrupted. Doesn't that necessarily mean we must mutate the existing 
snapshot (and subsequent snapshots) to correct it?
   > 
   > i was just thinking we make a new snapshot with correct summary stats 
(only the totals). But yes you are right, it is open to interpretation, in this 
case you cant go back and fix wrong stats, so this particular feature probably 
does need more thought.
   
   Really sorry for the delayed response @szehon-ho I forgot I had this open. 
So I was discussing this with @rdblue and I am more convinced of your point 
that we may as well have a unified `RepairTable` action with different 
configuration methods. The compelling arguments for me at least is that it 
follows the existing API patterns, for example `ManageSnapshots` has different 
options but serves as a useful entry point. `RepairTable` is a useful entry 
point for a user, there can be sane defaults on what to repair, and power users 
can specify which specific operations they want to run if they know what's 
broken. Another example of this came up in 
https://github.com/apache/iceberg/pull/10755/files#r1696094932 for removing 
unused specs where we were thinking of having an entry point maintenance API 
for both removing unused specs and schemas. So from an API consistency 
perspective, I think it's good to have the same pattern for this action.
   
   Furthermore, the procedure is traversing the entire metadata tree  so 
practically there's probably overlap across the different repair operations.
   
   I can update the PR based on this 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to