ConfX created HBASE-29804:
-----------------------------
Summary: NullPointerException in WorkerAssigner.serverAdded during
Master Shutdown and Restart
Key: HBASE-29804
URL: https://issues.apache.org/jira/browse/HBASE-29804
Project: HBase
Issue Type: Bug
Components: master
Affects Versions: 2.6.4, 2.6.3
Reporter: ConfX
h2. Summary
`WorkerAssigner.serverAdded()` throws NullPointerException when called during
or after Master shutdown because it accesses `MasterProcedureExecutor` without
null checks, but `MasterProcedureExecutor` is explicitly set to null during the
shutdown process.
h2. Affected Component
- **File:**
`hbase-server/src/main/java/org/apache/hadoop/hbase/master/WorkerAssigner.java`
- **Method:** `serverAdded(ServerName worker)` at line 83
h2. Root Cause Analysis
h3. The Bug
The `WorkerAssigner` class implements `ServerListener` and registers itself
with `ServerManager` to receive server event notifications. In the
`serverAdded()` callback method, it accesses a chain of objects without null
checks:
{code:java}
@Override
public void serverAdded(ServerName worker) {
this.wake(master.getMasterProcedureExecutor().getEnvironment().getProcedureScheduler());
}{code}
h3. Why NPE Occurs
During Master shutdown, the `procedureExecutor` is explicitly set to `null` in
`HMaster.stopProcedureExecutor()`:
{code:java}
// HMaster.java:1849-1856
private void stopProcedureExecutor() {
if (procedureExecutor != null) {
configurationManager.deregisterObserver(procedureExecutor.getEnvironment());
procedureExecutor.getEnvironment().getRemoteDispatcher().stop();
procedureExecutor.stop();
procedureExecutor.join();
procedureExecutor = null; // <-- Set to null here
}
// ...
}{code}
However, `WorkerAssigner` is **never unregistered** from `ServerManager` during
shutdown. If any `serverAdded()` event is triggered during or after the
shutdown process (while `WorkerAssigner` is still registered as a listener), it
will cause an NPE because `getMasterProcedureExecutor()` returns `null`.
h3. Call Chain When Failure Occurs
1. `ServerManager.regionServerReport()` (line 295)
2. `ServerManager.checkAndRecordNewServer()` (line 377)
3. `ServerListener.serverAdded()` callback for all registered listeners
4. `WorkerAssigner.serverAdded()` (line 83) - **NPE occurs here**
## Stacktrace
{code:java}
java.lang.NullPointerException
at
org.apache.hadoop.hbase.master.WorkerAssigner.serverAdded(WorkerAssigner.java:83)
at
org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:377)
at
org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:295)
at
org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:573)
{code}
h3. Potential Fix
h4. Option 1: Add Null Check in serverAdded() (Recommended)
{code:java}
@Override
public void serverAdded(ServerName worker) {
ProcedureExecutor<MasterProcedureEnv> executor =
master.getMasterProcedureExecutor();
if (executor != null && executor.getEnvironment() != null) {
this.wake(executor.getEnvironment().getProcedureScheduler());
}
} {code}
h3. Option 2: Unregister WorkerAssigner During Shutdown
Add a `close()` or `stop()` method to `WorkerAssigner` that unregisters it from
`ServerManager`, and call it during master shutdown in `SplitWALManager` and
`SnapshotManager`.
{code:java}
public void stop() {
ServerManager sm = this.master.getServerManager();
if (sm != null) {
sm.unregisterListener(this);
}
} {code}
h3. Option 3: Both (Most Robust)
Implement both the null check (defensive programming) AND proper unregistration
(proper lifecycle management).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)