[
https://issues.apache.org/jira/browse/HBASE-29308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17950819#comment-17950819
]
Kadir Ozdemir commented on HBASE-29308:
---------------------------------------
[~zhangduo], [~vjasani], after an internal discussion and positive feedback
from [~apurtell] and [~dmanning], I decided to create this Jira. Please let me
know your thoughts on this.
> Reducing region unavailability during region movement
> -----------------------------------------------------
>
> Key: HBASE-29308
> URL: https://issues.apache.org/jira/browse/HBASE-29308
> Project: HBase
> Issue Type: Improvement
> Reporter: Kadir Ozdemir
> Priority: Major
>
> Region movement is the process of transferring a region from one RegionServer
> to another where the region on the source RegionServer is closed and this
> region is opened on the target RegionServer. In the current design, the
> region is unavailable for the period of closing the region on the source
> RegionServer and then opening it on the target RegionServer.
> The main operations during region close include flushing MemStore, waiting
> for in-progress operations to complete (by acquiring the region operation
> lock exclusively), removing compacting files, and evicting the blocks in the
> block cache for the stores of the region. The operations for opening a region
> include reading the region info file, checking if there are any WAL files to
> replay, opening store files and reading metadata and possibly bloom filters.
> It is clear that executing these steps sequentially can take some time and
> prolong the region's unavailability.
> Most of the above operations can be done outside (before or after) the
> region’s unavailability window. As described below, we actually need to
> include only flushing MemStore on the source RegionServer, and then loading
> the store files generated during this MemStore flush on the target
> RegionServer in the unavailability window.
> The region unavailability time can be reduced by introducing two new region
> state WARMING and MOVING as follows:
> # A new copy region is opened on the target RegionServer. This copy of the
> region is not visible to HMaster and clients yet. The region is set to be in
> state WARMING. In this state, it is not ready to serve reads or writes. The
> WARMING state is an in-memory state and not recorded in the meta table. The
> WARMING regions need to be cleaned up if the region move operation fails. If
> a region remains in the WARMING state longer than a specified timeout period,
> this operation can be executed locally on the target RegionServer after the
> timeout.
> # The next step is to put the region of the source RegionServer in the
> MOVING state. This will trigger MemStore flushing. In the MOVING state, the
> region will not accept new (read or write) operations but continue serving
> in-progress read (gets and scans) operations. Please note as part of snapshot
> isolation, these operations are allowed. This is essentially the initial part
> of the region CLOSING state where the MemStore is flushed.
> # When the region completes MemStore flushing, the target region is notified
> that new HFiles are created for the region. The target region loads these
> files, meaning that it opens these files and reads its metadata. Then the
> region state (for the region in the target RegionServer) will change to OPEN
> and its location info will be updated with the target RegionServer in the
> meta table, and the HMaster node will be notified about this change. Thus,
> the region on the target RegionServer will be visible to the clients.
> # Finally, the region on the source RegionServer will be closed.
> With this design, the region will be unavailable for new operations only for
> the period of flushing MemStore, loading store files generated by MemStore
> flushes, updating the meta table, and notifying HMaster.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)