hgromer commented on code in PR #7523: URL: https://github.com/apache/hbase/pull/7523#discussion_r2594061417
########## hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/master/BackupLogCleaner.java: ########## Review Comment: I'd like to push back here a bit, and reconsider the tradeoffs between the complexities and what we're risking by having a more complicated design for WAL retention. I would prefer to have a solution that is easier to reason about, and store WAL files for longer than necessary rather than an implementation that is harder to debug and can lead to complicated edge cases. I'm open to suggestions, but I don't think solving this at a RS level is simple, I explored that path and quickly foudn myself having to re-architect the backup system table. For example, the obvious solution is to [retain WALs for addresses we aren't storing](https://github.com/apache/hbase/blob/master/hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/master/BackupLogCleaner.java#L224). But what if the server joined was removed from the cluster in between backups, how do we make sure we _do_ end up cleaning up the WAL files eventually? We could have the backup add the host to it's boundary map, but what's the timestamp to set? The backup system table does _not_ include the host start code, so we can't set the entry to a ts of `Long.MAX_VALUE` because if the host does eventually rejoin the cluster, that'd be problematic. We can also have a race where the host joins the cluster exactly as we're adding to the boundary mapping. To be quite frank, after spending much time with the modern backup system as a whole, I'd much rather err on the side of simplicity, choosing to retain WALs for a bit longer than is necessary rather than continue to make an already complicated system more complicated. Happy to hear your thoughts, but I think (with a fix for the case of missing timestamps) is a non-trivial statement from what I've seen. > Do we have a way to know whether a file offered to the cleaner is part of a table covered by backups We don't, other than by reading the WAL. A WAL file can contain edits from mulitple tables. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
