[ 
https://issues.apache.org/jira/browse/HBASE-29804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ConfX updated HBASE-29804:
--------------------------
    Description: 
h2. Summary

 
`WorkerAssigner.serverAdded()` throws NullPointerException when called during 
or after Master shutdown because it accesses `MasterProcedureExecutor` without 
null checks, but `MasterProcedureExecutor` is explicitly set to null during the 
shutdown process.
h2. Affected Component
 - {*}{{*}}File:{{*}}{*} 
`hbase-server/src/main/java/org/apache/hadoop/hbase/master/WorkerAssigner.java`
 - {*}{{*}}Method:{{*}}{*} `serverAdded(ServerName worker)` at line 83

h2. Root Cause Analysis
h3. The Bug

 
The `WorkerAssigner` class implements `ServerListener` and registers itself 
with `ServerManager` to receive server event notifications. In the 
`serverAdded()` callback method, it accesses a chain of objects without null 
checks:
{code:java}
@Override
public void serverAdded(ServerName worker) {
  
this.wake(master.getMasterProcedureExecutor().getEnvironment().getProcedureScheduler());
}{code}
h3. Why NPE Occurs

During Master shutdown, the `procedureExecutor` is explicitly set to `null` in 
`HMaster.stopProcedureExecutor()`:
{code:java}
// HMaster.java:1849-1856
private void stopProcedureExecutor() {
  if (procedureExecutor != null) {
    configurationManager.deregisterObserver(procedureExecutor.getEnvironment());
    procedureExecutor.getEnvironment().getRemoteDispatcher().stop();
    procedureExecutor.stop();
    procedureExecutor.join();
    procedureExecutor = null;  // <-- Set to null here
  }
  // ...
}{code}
However, `WorkerAssigner` is {*}{{*}}never unregistered{{*}}{*} from 
`ServerManager` during shutdown. If any `serverAdded()` event is triggered 
during or after the shutdown process (while `WorkerAssigner` is still 
registered as a listener), it will cause an NPE because 
`getMasterProcedureExecutor()` returns `null`.
h3. Call Chain When Failure Occurs

1. `ServerManager.regionServerReport()` (line 295)
2. `ServerManager.checkAndRecordNewServer()` (line 377)
3. `ServerListener.serverAdded()` callback for all registered listeners
4. `WorkerAssigner.serverAdded()` (line 83) - {*}{{*}}NPE occurs here{{*}}{*}
h3. Stacktrace
{code:java}
java.lang.NullPointerException
at 
org.apache.hadoop.hbase.master.WorkerAssigner.serverAdded(WorkerAssigner.java:83)
at 
org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:377)
at 
org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:295)
at 
org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:573)
 {code}
h3. Potential Fix
h4. Option 1: Add Null Check in serverAdded() (Recommended)

 
{code:java}
@Override
public void serverAdded(ServerName worker) {
    ProcedureExecutor<MasterProcedureEnv> executor = 
master.getMasterProcedureExecutor();
    if (executor != null && executor.getEnvironment() != null) {
        this.wake(executor.getEnvironment().getProcedureScheduler());
    }
} {code}
h3. Option 2: Unregister WorkerAssigner During Shutdown

 
Add a `close()` or `stop()` method to `WorkerAssigner` that unregisters it from 
`ServerManager`, and call it during master shutdown in `SplitWALManager` and 
`SnapshotManager`.
 
{code:java}
public void stop() {
    ServerManager sm = this.master.getServerManager();
    if (sm != null) {
        sm.unregisterListener(this);
    }
} {code}
h3. Option 3: Both (Most Robust)

Implement both the null check (defensive programming) AND proper unregistration 
(proper lifecycle management).
 

  was:
h2. Summary

 
`WorkerAssigner.serverAdded()` throws NullPointerException when called during 
or after Master shutdown because it accesses `MasterProcedureExecutor` without 
null checks, but `MasterProcedureExecutor` is explicitly set to null during the 
shutdown process.
h2. Affected Component
 - *{*}File:{*}* 
`hbase-server/src/main/java/org/apache/hadoop/hbase/master/WorkerAssigner.java`
 - *{*}Method:{*}* `serverAdded(ServerName worker)` at line 83

h2. Root Cause Analysis
h3. The Bug

 
The `WorkerAssigner` class implements `ServerListener` and registers itself 
with `ServerManager` to receive server event notifications. In the 
`serverAdded()` callback method, it accesses a chain of objects without null 
checks:
{code:java}
@Override
public void serverAdded(ServerName worker) {
  
this.wake(master.getMasterProcedureExecutor().getEnvironment().getProcedureScheduler());
}{code}
h3. Why NPE Occurs

During Master shutdown, the `procedureExecutor` is explicitly set to `null` in 
`HMaster.stopProcedureExecutor()`:
{code:java}
// HMaster.java:1849-1856
private void stopProcedureExecutor() {
  if (procedureExecutor != null) {
    configurationManager.deregisterObserver(procedureExecutor.getEnvironment());
    procedureExecutor.getEnvironment().getRemoteDispatcher().stop();
    procedureExecutor.stop();
    procedureExecutor.join();
    procedureExecutor = null;  // <-- Set to null here
  }
  // ...
}{code}

However, `WorkerAssigner` is *{*}never unregistered{*}* from `ServerManager` 
during shutdown. If any `serverAdded()` event is triggered during or after the 
shutdown process (while `WorkerAssigner` is still registered as a listener), it 
will cause an NPE because `getMasterProcedureExecutor()` returns `null`.
h3. Call Chain When Failure Occurs

1. `ServerManager.regionServerReport()` (line 295)
2. `ServerManager.checkAndRecordNewServer()` (line 377)
3. `ServerListener.serverAdded()` callback for all registered listeners
4. `WorkerAssigner.serverAdded()` (line 83) - *{*}NPE occurs here{*}*
 
h3. Stacktrace
{code:java}
java.lang.NullPointerException
at 
org.apache.hadoop.hbase.master.WorkerAssigner.serverAdded(WorkerAssigner.java:83)
at 
org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:377)
at 
org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:295)
at 
org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:573)
 {code}
 
h3. Potential Fix
h4. Option 1: Add Null Check in serverAdded() (Recommended)

 
{code:java}
@Override
public void serverAdded(ServerName worker) {
    ProcedureExecutor<MasterProcedureEnv> executor = 
master.getMasterProcedureExecutor();
    if (executor != null && executor.getEnvironment() != null) {
        this.wake(executor.getEnvironment().getProcedureScheduler());
    }
} {code}
h3. Option 2: Unregister WorkerAssigner During Shutdown

 
Add a `close()` or `stop()` method to `WorkerAssigner` that unregisters it from 
`ServerManager`, and call it during master shutdown in `SplitWALManager` and 
`SnapshotManager`.
 
 
{code:java}
public void stop() {
    ServerManager sm = this.master.getServerManager();
    if (sm != null) {
        sm.unregisterListener(this);
    }
} {code}
h3. Option 3: Both (Most Robust)

Implement both the null check (defensive programming) AND proper unregistration 
(proper lifecycle management).
 


> NullPointerException in WorkerAssigner.serverAdded during Master Shutdown and 
> Restart
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-29804
>                 URL: https://issues.apache.org/jira/browse/HBASE-29804
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 2.6.3, 2.6.4
>            Reporter: ConfX
>            Priority: Major
>
> h2. Summary
>  
> `WorkerAssigner.serverAdded()` throws NullPointerException when called during 
> or after Master shutdown because it accesses `MasterProcedureExecutor` 
> without null checks, but `MasterProcedureExecutor` is explicitly set to null 
> during the shutdown process.
> h2. Affected Component
>  - {*}{{*}}File:{{*}}{*} 
> `hbase-server/src/main/java/org/apache/hadoop/hbase/master/WorkerAssigner.java`
>  - {*}{{*}}Method:{{*}}{*} `serverAdded(ServerName worker)` at line 83
> h2. Root Cause Analysis
> h3. The Bug
>  
> The `WorkerAssigner` class implements `ServerListener` and registers itself 
> with `ServerManager` to receive server event notifications. In the 
> `serverAdded()` callback method, it accesses a chain of objects without null 
> checks:
> {code:java}
> @Override
> public void serverAdded(ServerName worker) {
>   
> this.wake(master.getMasterProcedureExecutor().getEnvironment().getProcedureScheduler());
> }{code}
> h3. Why NPE Occurs
> During Master shutdown, the `procedureExecutor` is explicitly set to `null` 
> in `HMaster.stopProcedureExecutor()`:
> {code:java}
> // HMaster.java:1849-1856
> private void stopProcedureExecutor() {
>   if (procedureExecutor != null) {
>     
> configurationManager.deregisterObserver(procedureExecutor.getEnvironment());
>     procedureExecutor.getEnvironment().getRemoteDispatcher().stop();
>     procedureExecutor.stop();
>     procedureExecutor.join();
>     procedureExecutor = null;  // <-- Set to null here
>   }
>   // ...
> }{code}
> However, `WorkerAssigner` is {*}{{*}}never unregistered{{*}}{*} from 
> `ServerManager` during shutdown. If any `serverAdded()` event is triggered 
> during or after the shutdown process (while `WorkerAssigner` is still 
> registered as a listener), it will cause an NPE because 
> `getMasterProcedureExecutor()` returns `null`.
> h3. Call Chain When Failure Occurs
> 1. `ServerManager.regionServerReport()` (line 295)
> 2. `ServerManager.checkAndRecordNewServer()` (line 377)
> 3. `ServerListener.serverAdded()` callback for all registered listeners
> 4. `WorkerAssigner.serverAdded()` (line 83) - {*}{{*}}NPE occurs here{{*}}{*}
> h3. Stacktrace
> {code:java}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.master.WorkerAssigner.serverAdded(WorkerAssigner.java:83)
> at 
> org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:377)
> at 
> org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:295)
> at 
> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:573)
>  {code}
> h3. Potential Fix
> h4. Option 1: Add Null Check in serverAdded() (Recommended)
>  
> {code:java}
> @Override
> public void serverAdded(ServerName worker) {
>     ProcedureExecutor<MasterProcedureEnv> executor = 
> master.getMasterProcedureExecutor();
>     if (executor != null && executor.getEnvironment() != null) {
>         this.wake(executor.getEnvironment().getProcedureScheduler());
>     }
> } {code}
> h3. Option 2: Unregister WorkerAssigner During Shutdown
>  
> Add a `close()` or `stop()` method to `WorkerAssigner` that unregisters it 
> from `ServerManager`, and call it during master shutdown in `SplitWALManager` 
> and `SnapshotManager`.
>  
> {code:java}
> public void stop() {
>     ServerManager sm = this.master.getServerManager();
>     if (sm != null) {
>         sm.unregisterListener(this);
>     }
> } {code}
> h3. Option 3: Both (Most Robust)
> Implement both the null check (defensive programming) AND proper 
> unregistration (proper lifecycle management).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to