[ 
https://issues.apache.org/jira/browse/HBASE-28919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17890469#comment-17890469
 ] 

Duo Zhang commented on HBASE-28919:
-----------------------------------

IIRC when I was at xiaomi, we have a 'soft deletion feature' in HBase. When 
deleteing a table, we will first take a snapshot of it(since a table can only 
be dropped after disabling, so taking a snapshot does not require the time 
consuming flush operation). You can config the TTL for these snapshots, and 
there is a background task IIRC to delete these snapshots before reaching TTL 
if space is not enough.

I can not recall whether they have pushed this feature to the community version 
of hbase...

> Soft drop for destructive table actions
> ---------------------------------------
>
>                 Key: HBASE-28919
>                 URL: https://issues.apache.org/jira/browse/HBASE-28919
>             Project: HBase
>          Issue Type: New Feature
>          Components: master, snapshots
>            Reporter: Andrew Kyle Purtell
>            Assignee: Andrew Kyle Purtell
>            Priority: Major
>         Attachments: Soft Drop for Destructive Table Actions.pdf
>
>
> When we administratively drop a table column or entire table, or truncate a 
> table, the process begins rapidly. Procedures are scheduled for immediate 
> execution that then modify or remove descriptors and state in META and on 
> disk, and take unrecoverable actions at the HDFS layer. Although HFiles are 
> copied to the archive in a destructive action, recovery scenarios are not 
> automatic and involve some operator labor to reconstruct the table and 
> re-import the archived data. If the HFileCleaner is not properly configured 
> to facilitate such recovery then some data is not recoverable soon after 
> procedure execution commences and all affected data is not recoverable within 
> minutes. A customer faced with such an accident will be unhappy because the 
> recovery scenarios available to them from this will involve either a restore 
> from backup or from an earlier snapshot, and any changes committed more 
> recently than the time of the last backup or last snapshot will be lost. 
> An effective solution is very simple: We can easily prevent the deletion of 
> the HFiles of a deleted table or table column family by taking a snapshot of 
> the table immediately prior to taking any destructive actions. We set a TTL 
> on the snapshot so housekeeping of truly unwanted HFiles remains no touch. 
> Because we take a table snapshot all table structure and metadata is also 
> captured and saved so fast recovery is possible, as either a restore from 
> snapshot, or a clone from snapshot to a new table. For as long as the 
> snapshot is retained it is straightforward to recover the table data by 
> either restoring the table from the snapshot or cloning the snapshot to a new 
> table, at the operator’s discretion. 
> No manual actions are required to see the table or column family (or 
> families) truly dropped. Once the snapshot TTL expires all the HFiles related 
> to the dropped table become eligible for deletion. When the HFileCleaner 
> chore executes after that time the HDFS level file deletes will commence with 
> associated reduction in storage requirements. 
> Design document is attached. 
> I have a *working implementation* of this proposal based on a fork of 
> branch-2.5. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to