[
https://issues.apache.org/jira/browse/HBASE-29308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17951008#comment-17951008
]
Andrew Kyle Purtell edited comment on HBASE-29308 at 5/12/25 4:10 PM:
----------------------------------------------------------------------
This is especially useful to handle the case where a client performing a long
running scan holds up the close lock in today's code. Today what we do, after
HBASE-25212, is wait for the RPC handlers to complete. Meanwhile, the region is
closing and will not accept new RPCs, so is from the client point of view in an
unavailable state, so we abort the regionserver if this goes on for too long.
Once this proposal is implemented we can let those scans run for as long as
they need without availability concerns. No need to interrupt them or abort the
regionserver.
[~kadir] Coprocessors might override scan behavior and mutate the region data
after ownership has effectively transferred, bypassing checks, causing
additional flushes. Today we are guaranteed no activity on the region except at
either the closing or opening location. After implementing this proposal,
concurrent activity on the region in two locations (reads finishing up in one,
opening in another) is likely. If the reads that are finishing up involve
coprocessors, there could be issues.
We can't prevent such issues, because coprocessors can be arbitrary code. I am
not suggesting that. However, ways to help coprocessor implementors avoid
problems should be considered. Such advice should be provided to the Phoenix
project so they can check they don't develop any new concurrency issues.
was (Author: apurtell):
This is especially useful to handle the case where a client performing a long
running scan holds up the close lock in today's code. Today what we do, after
HBASE-25212, is wait for the RPC handlers to complete. Meanwhile, the region is
closing and will not accept new RPCs, so is from the client point of view in an
unavailable state, so we abort the regionserver if this goes on for too long.
Once this proposal is implemented we can let those scans run for as long as
they need without availability concerns. No need to interrupt them or abort the
regionserver.
[~kadir] Coprocessors might override scan behavior and mutate the region data
after ownership has effectively transferred, bypassing checks, causing
additional flushes. Today we are guaranteed no activity on the region except at
either the closing or opening location. After implementing this proposal,
concurrent activity on the region in two locations (reads finishing up in one,
opening in another) is likely.
Helping coprocessor implementers avoid related problems should be considered in
the design. This advice should be provided to the Phoenix project so they can
check they don't develop any new concurrency issues.
> Reducing region unavailability during region movement
> -----------------------------------------------------
>
> Key: HBASE-29308
> URL: https://issues.apache.org/jira/browse/HBASE-29308
> Project: HBase
> Issue Type: Improvement
> Reporter: Kadir Ozdemir
> Priority: Major
>
> Region movement is the process of transferring a region from one RegionServer
> to another where the region on the source RegionServer is closed and this
> region is opened on the target RegionServer. In the current design, the
> region is unavailable for the period of closing the region on the source
> RegionServer and then opening it on the target RegionServer.
> The main operations during region close include flushing MemStore, waiting
> for in-progress operations to complete (by acquiring the region operation
> lock exclusively), removing compacting files, and evicting the blocks in the
> block cache for the stores of the region. The operations for opening a region
> include reading the region info file, checking if there are any WAL files to
> replay, opening store files and reading metadata and possibly bloom filters.
> It is clear that executing these steps sequentially can take some time and
> prolong the region's unavailability.
> Most of the above operations can be done outside (before or after) the
> region’s unavailability window. As described below, we actually need to
> include only flushing MemStore on the source RegionServer, and then loading
> the store files generated during this MemStore flush on the target
> RegionServer in the unavailability window.
> The region unavailability time can be reduced by introducing two new region
> state WARMING and MOVING as follows:
> # A new copy region is opened on the target RegionServer. This copy of the
> region is not visible to HMaster and clients yet. The region is set to be in
> state WARMING. In this state, it is not ready to serve reads or writes. The
> WARMING state is an in-memory state and not recorded in the meta table. The
> WARMING regions need to be cleaned up if the region move operation fails. If
> a region remains in the WARMING state longer than a specified timeout period,
> this operation can be executed locally on the target RegionServer after the
> timeout.
> # The next step is to put the region of the source RegionServer in the
> MOVING state. This will trigger MemStore flushing. In the MOVING state, the
> region will not accept new (read or write) operations but continue serving
> in-progress read (gets and scans) operations. Please note as part of snapshot
> isolation, these operations are allowed. This is essentially the initial part
> of the region CLOSING state where the MemStore is flushed.
> # When the region completes MemStore flushing, the target region is notified
> that new HFiles are created for the region. The target region loads these
> files, meaning that it opens these files and reads its metadata. Then the
> region state (for the region in the target RegionServer) will change to OPEN
> and its location info will be updated with the target RegionServer in the
> meta table, and the HMaster node will be notified about this change. Thus,
> the region on the target RegionServer will be visible to the clients.
> # Finally, the region on the source RegionServer will be closed.
> With this design, the region will be unavailable for new operations only for
> the period of flushing MemStore, loading store files generated by MemStore
> flushes, updating the meta table, and notifying HMaster.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)