fuzing opened a new issue, #12883:
URL: https://github.com/apache/iceberg/issues/12883

   ### Feature Request / Improvement
   
   
   At the moment, once a data file goes missing or becomes corrupted, table 
functionality is diminished or completely lost due to cascading errors as a 
result of the missing/corrupted files (depends on query engine etc.)
   
   In the event of data file loss or corruption, it would be useful to have a 
procedure that regenerates a new snapshot and metadata that excludes the 
missing/corrupted file/s, while reporting same.
   
   This procedure might be extended to include those circumstances where 
metadata and/or snapshot files are corrupted (with varying degrees of rebuild 
success depending on the damage).
   
   One could imagine multiple strategies for such a tool - e.g.:
   - Perform a simple data file existence check and exclude those that are 
missing (cheap, because data files don't need to be read)
   - Perform a complete sanity check of the table structure (expensive, as each 
data file would need to be decompressed/ingested and checked for integrity)
   - etc.
   
   Similar to other (spark) procedures, this one might have a "dry_run" flag 
such that issues are identified and the plan for repair is articulated prior to 
initiating it.
   
   
   
   ### Query engine
   
   Spark
   
   ### Willingness to contribute
   
   - [ ] I can contribute this improvement/feature independently
   - [ ] I would be willing to contribute this improvement/feature with 
guidance from the Iceberg community
   - [ ] I cannot contribute this improvement/feature at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to