[
https://issues.apache.org/jira/browse/HBASE-29136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Junegunn Choi reassigned HBASE-29136:
-------------------------------------
Assignee: Junegunn Choi
> Make ALTER operations safer
> ---------------------------
>
> Key: HBASE-29136
> URL: https://issues.apache.org/jira/browse/HBASE-29136
> Project: HBase
> Issue Type: Improvement
> Reporter: Junegunn Choi
> Assignee: Junegunn Choi
> Priority: Major
>
> I'd like to start a discussion on how we can make ALTER operations safer in
> HBase.
> h1. Problem Statement
> A small mistake can lead to a catastrophic failure.
> {code:ruby}
> create 't', 'd', { NUMREGIONS => 10, SPLITALGO => 'HexStringSplit' }
> # Wanted to specify 512KB, but mistakenly put an extra *, ended up passing an
> astronomical value
> alter 't', { CONFIGURATION => { 'hbase.storescanner.pread.max.bytes' => 1024
> ** 512 } }
> # Mistakenly thinking that HBase would understand this notation
> alter 't', { CONFIGURATION => { 'hbase.storescanner.pread.max.bytes' => '2MB'
> } }
> {code}
> What then happens is that all the regions of the table go down and never come
> back up. HBase retries and retries as if more retries would make
> {{NumberFormatException}} go away.
> {noformat}
> assignment.TransitRegionStateProcedure: Retry=9 of max=2147483647; pid=46,
> ppid=38, state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, locked=true;
> TransitRegionStateProcedure table=t, region=3347e7a93b19edda40c2ce08fe3cbf27,
> REOPEN/MOVE; state=OPENING, location=localhost,16020,1739865304886
> assignment.TransitRegionStateProcedure: Failed transition, suspend 32secs
> pid=46, ppid=38, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE,
> locked=true; TransitRegionStateProcedure table=t,
> region=3347e7a93b19edda40c2ce08fe3cbf27, REOPEN/MOVE; state=OPENING,
> location=null; waiting on rectified condition fixed by other Procedure or
> operator intervention
> org.apache.hadoop.hbase.HBaseIOException: Failed confirm OPEN of
> state=OPENING, location=null, table=t,
> region=3347e7a93b19edda40c2ce08fe3cbf27 (remote log may yield more detail on
> why).{noformat}
> How do you get the service back up? Here's what an ordinary operator would do:
> * First, you kill the HBase shell that is stuck waiting for the response and
> restart it.
> * Then you try to undo the change with a correct {{alter}} command, only to
> learn that the previous operation is holding the lock.
> {code:ruby}
> alter 't', { CONFIGURATION => { 'hbase.storescanner.pread.max.bytes' => 1024
> * 512 } }
> {code}
> * Restart the shell again and try to {{{}disable{}}}. But it's stuck again.
> {code:ruby}
> disable 't'
> {code}
> * You realize that normal operations are not going to work. You reach for
> {{hbck}} and "bypass" the procedures.
> {code:ruby}
> hbck = @hbase.instance_variable_get(:@connection).hbck
> hbck.bypassProcedure([30, 153], 10000, true, true)
> {code}
> * Now the procedures are gone, but the regions are still stuck in
> {{OPENING}} state, and the table is in {{DISABLING}} state.
> * Try {{alter}} again. It seems to work!
> {code:ruby}
> alter 't', { CONFIGURATION => { 'hbase.storescanner.pread.max.bytes' => 1024
> * 512 } }
> # Updating all regions with the new schema...
> # All regions updated.
> # Done.
> # Took 1.1817 seconds
> {code}
> * But you can't {{enable}} it.
> {code:ruby}
> enable 't'
> # ERROR: Table tableName=t, state=DISABLING should be disabled!
> {code}
> * You need to modify the table state using {{hbck}} again
> {code:ruby}
> hbck.setTableStateInMeta(
> org.apache.hadoop.hbase.client.TableState.new(
> TableName.valueOf('t'),
> org.apache.hadoop.hbase.client.TableState::State::DISABLED
> )
> )
> {code}
> * Now you can finally enable it.
> {code:ruby}
> enable 't'
> {code}
> This is a painful process for such a small mistake. If you're not familiar
> with {{{}hbck{}}}, you might have to spend hours figuring out how to get the
> table back up.
> We should try to make the alter operation safer.
> h1. Current ways to alleviate the problem
> * {{hbase.reopen.table.regions.progressive.batch.size.max}} helps a lot in
> this scenario. With the config, only one region will be out of service.
> ** However, you still have to do the whole dance to get the region back up.
> * Having a small {{hbase.assignment.maximum.attempts}} value can help as the
> procedure will fail fast, and we can avoid the "bypass" step.
> ** But it's a global configuration of the HBase master, and setting it to a
> very small number is probably not a good idea.
> h1. Client effort to avoid the problem
> Ideally, the operator should verify the alter command beforehand on a test
> cluster, preferably with a small {{hbase.assignment.maximum.attempts}} value,
> and proceed with the production cluster only if it works as expected.
> This is probably what everyone should be doing, but mistakes are made in the
> real world, and it shouldn't be this easy to unintentionally bring the table
> down.
> h1. Proposals to make alter operation safer
> h2. Proposal 1. Make it easier to verify the alter command on a temporary
> table
> To avoid such mistakes, the operator would want to verify an alter command on
> a temporary table first. But it's surprisingly not easy to do it properly.
> I'll explain why. This proposal aims to make the process easier.
> h3. Proposal 1-1. Table-wise {{hbase.assignment.maximum.attempts}}
> The default value of {{hbase.assignment.maximum.attempts}} is
> {{{}INT_MAX{}}}. So HBase will endlessly retry even on non-recoverable errors
> like {{{}NumberFormatException{}}}. And the operator has to rely on {{hbck}}
> to "bypass" the retrying procedure.
> If we can make the configuration table-wise, it would be easier to verify an
> alter command on a temporary table and clean up the mess if something goes
> wrong. (i.e. no need for {{{}bypass{}}})
> {code:ruby}
> create 'temp', 'd', { MAX_ASSIGNMENT_ATTEMPTS => 1 }
> alter 'temp', { CONFIGURATION => { 'hbase.hregion.majorcompaction' => 'x' } }
> {code}
> h3. Proposal 1-2. Allowing disabling regions in {{FAILED_OPEN}} state
> So with the above configuration, we can avoid using {{hbck}} for bypassing
> the procedure. But there's another problem. The region will be in
> {{FAILED_OPEN}} state, and because of that, we can't disable the table. When
> you try to disable the table, HBase will try to reassign the region.
> [https://github.com/apache/hbase/blob/e00f492a216a8827cb1d47ded6e259ddc691d764/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java#L400-L404]
> So to disable and drop the table, we would have to use {{hbck}} to alter the
> state in the meta table.
> I don't see why we shouldn't allow disabling (closing) regions in
> {{FAILED_OPEN}} state. They are not open, so we just need to update the meta
> table and mark them as {{{}CLOSED{}}}.
> h3. Proposal 1-3. Provide safe alter command in the shell (optional)
> With the above enhancements, users can implement a safe alter command that
> works as follows:
> # Create a temp table with the same schema as the target table, but with
> {{MAX_ASSIGNMENT_ATTEMPTS}} set to 1.
> # Apply the alter command to the temp table.
> ## If it succeeds, drop the temp table and apply the alter command to the
> target table.
> ## If it fails, drop the temp table. Abort the operation.
> But, we can consider providing the official implementation of the scenario
> above as a new shell command, something like {{{}alter_safe{}}}.
> h2. Proposal 2. Add type validation to the configuration
> While the method described above can help in many cases, I realized there are
> some configuration parameters that can outright crash HBase server.
> Here are some examples:
> {code:ruby}
> alter 't', { CONFIGURATION => {
> 'hbase.regionserver.optionalcacheflushinterval' => 'x' } }
> alter 't', { CONFIGURATION => { 'hbase.hregion.max.filesize.jitter' => 'x' } }
> alter 't', { CONFIGURATION => { 'hbase.hstore.open.and.close.threads.max' =>
> 'x' } }
> alter 't', { CONFIGURATION => { 'hbase.hregion.memstore.block.multiplier' =>
> 'x' } }
> {code}
> {noformat}
> handler.AssignRegionHandler: Fatal error occurred while opening region
> t,,1739868021543.c2a35d28432c8869c40d3fe9f464c52e., aborting...
> java.lang.IllegalStateException: Could not instantiate a region
> instance.{noformat}
> So testing an alter command on a temp table is clearly not enough of a
> safeguard.
> To prevent fatal errors like this, we have to properly validate the values of
> these configuration parameters. We have {{TableDescriptorChecker}} for the
> purpose, but it is far from complete. We can start adding checks to the
> class, but it's a tedious and time-consuming process, and it's easy to miss
> updating it when a new configuration is added. To increase the validation
> coverage without too much effort, I propose adding type definition to the
> configuration keys as below:
> {code:java}
> public static final String HBASE_HSTORE_COMPACTION_RATIO_KEY =
> ConfigKey.FLOAT("hbase.hstore.compaction.ratio");
> public static final String HBASE_HSTORE_COMPACTION_MIN_KEY =
> ConfigKey.INT("hbase.hstore.compaction.min");
> public static final String HBASE_HSTORE_COMPACTION_MIN_SIZE_KEY =
> ConfigKey.LONG("hbase.hstore.compaction.min.size");
> // With additional predicates
> public static final String HREGION_MEMSTORE_BLOCK_MULTIPLIER =
> ConfigKey.INT("hbase.hregion.memstore.block.multiplier", v -> v > 0);
> {code}
> Then {{TableDescriptorChecker}} will perform basic type validation for these
> parameters before modifying the table.
> The advantages of this approach compared to manually addings checks to
> {{TableDescriptorChecker}} are as follows.
> * Easier to increase the coverage.
> ** Because it's much easier and faster to add simple type checks this way,
> than to add a separate code block to {{{}TableDescriptorChecker{}}}.
> * Easier to read and understand the code.
> ** {{ConfigKey.TYPE}} serves as a documentation of the expected type of the
> configuration key.
> * Easier to keep track of the configuration keys that need validation.
> But this approach has a limitation:
> * If a class is not loaded, the config keys listed in it are not going to be
> validated. We should track if these classes are loaded by the master server.
> ** {{{}HRegion{}}}, {{{}HStore{}}}, {{{}ScanInfo{}}}, and
> {{CompactionConfiguration}} classes are all in {{regionserver}} namespace,
> but they are loaded by the master during startup while initializing
> {{{}MasterRegion{}}}.
> *If we can make the validation coverage high enough with this approach, the
> first proposal might not be necessary.*
> h3. Alternative considered
> I thought about using annotations for the purpopse, but because the
> configuration keys are scattered across the codebase, it would be hard to
> track them all using reflection. It also has the same limitation as the
> suggested approach.
> h1. Summary
> We have patches for each proposal. Except for proposal 1-3, they are
> independent of each other, so I can open subtasks and submit patches. You can
> find them here: [https://github.com/junegunn/hbase/commits/alter-safe/]
> But considering that proposal 1 has a fundamental limitation (cannot deal
> with server crashes), I think we should go with proposal 2 and try to
> increase the validation coverage as much as possible. And once we have
> reached the point where we are confident that the validation coverage is high
> enough, we are not going need what's proposed in proposal 1.
> So I'm going to open a pull request for proposal 2 first. I have added type
> validations to some of the configuration keys that are used in alter commands.
> I'd like to hear your thoughts.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)