hgromer commented on code in PR #7523:
URL: https://github.com/apache/hbase/pull/7523#discussion_r2594061417


##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/master/BackupLogCleaner.java:
##########


Review Comment:
   I'd like to push back here a bit, and reconsider the tradeoffs between the 
complexities and what we're risking by having a more complicated design for WAL 
retention. 
   
   I would prefer to have a solution that is easier to reason about, and store 
WAL files for longer than necessary rather than an implementation that is 
harder to debug and can lead to complicated edge cases.
   
   I'm open to suggestions, but I don't think solving this at a RS level is 
simple, I explored that path and quickly foudn myself having to re-architect 
the backup system table. 
   
   For example, the obvious solution is to [retain WALs for addresses we aren't 
storing](https://github.com/apache/hbase/blob/master/hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/master/BackupLogCleaner.java#L224).
 But what if the server joined was removed from the cluster in between backups, 
how do we make sure we _do_ end up cleaning up the WAL files eventually?
   
   We could have the backup add the host to it's boundary map, but what's the 
timestamp to set? The backup system table does _not_ include the host start 
code, so we can't set the entry to a ts of `Long.MAX_VALUE` because if the host 
does eventually rejoin the cluster, that'd be problematic. We can also have a 
race where the host joins the cluster exactly as we're adding to the boundary 
mapping.
   
   To be quite frank, after spending much time with the modern backup system as 
a whole, I'd much rather err on the side of simplicity, choosing to retain WALs 
for a bit longer than is necessary rather than continue to make an already 
complicated system more complicated. 
   
   Happy to hear your thoughts, but I think (with a fix for the case of missing 
timestamps) is a non-trivial statement from what I've seen.
   
   > Do we have a way to know whether a file offered to the cleaner is part of 
a table covered by backups
   
   We don't, other than by reading the WAL. A WAL file can contain edits from 
mulitple tables.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to