[
https://issues.apache.org/jira/browse/HADOOP-18811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ConfX updated HADOOP-18811:
---------------------------
Description:
h2. What happened:
In ZKFailoverController.java, initRPC() function gets ZKFC RpcServer binding
address and create a new ZKFCRpcServer object rpcServer. However rpcServer may
be null when the ZKFCRpcServer constructor accepts a null policy provider and
cause any later rpcServer usage a null pointer exception.
h2. Buggy code:
In ZKFailoverController.java
{code:java}
protected void initRPC() throws IOException {
InetSocketAddress bindAddr = getRpcAddressToBindTo();
LOG.info("ZKFC RpcServer binding to {}", bindAddr);
rpcServer = new ZKFCRpcServer(conf, bindAddr, this, getPolicyProvider()); //
<-- Here getpolicyProvider might be null
}
{code}
ZKFCRpcServer() eventually calls refreshWithLoadedConfiguration() function
below. This function directly use provider without check null and this turns
out making rpcServer above to be a null object.
In ServiceAuthorizationManager.java
{code:java}
@Private
public void refreshWithLoadedConfiguration(Configuration conf, PolicyProvider
provider) {
...
// Parse the config file
Service[] services = provider.getServices(); // <--- provider might be
null here
... {code}
h2. How to trigger this bug:
(1) Set hadoop.security.authorization to true
(2) Run test
org.apache.hadoop.ha.TestZKFailoverControllerStress#testRandomExpirations
(3) You will see the following stack trace:
{code:java}
java.lang.NullPointerException
at
org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:258)
at
org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:63)
at
org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:181)
at
org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:177)
at
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:503)
at
org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:177)
at
org.apache.hadoop.ha.MiniZKFCCluster$DummyZKFCThread.doWork(MiniZKFCCluster.java:301)
at
org.apache.hadoop.test.MultithreadedTestUtil$TestingThread.run(MultithreadedTestUtil.java:189){code}
(4) The null pointer exception here is due to the null {{rpcServer}} object
caused by the bug described above.
You can use the reproduce.sh in the attachment to easily reproduce the bug:
We are happy to provide a patch if this issue is confirmed.
was:
h2. What happened:
In ZKFailoverController.java, initRPC() function gets ZKFC RpcServer binding
address and create a new ZKFCRpcServer object rpcServer. However rpcServer may
be null when the ZKFCRpcServer constructor accepts a null policy provider and
cause any later rpcServer usage a null pointer exception.
h2. Buggy code:
In ZKFailoverController.java
{code:java}
protected void initRPC() throws IOException {
InetSocketAddress bindAddr = getRpcAddressToBindTo();
LOG.info("ZKFC RpcServer binding to {}", bindAddr);
rpcServer = new ZKFCRpcServer(conf, bindAddr, this, getPolicyProvider()); //
<-- Here getpolicyProvider might be null
}
{code}
ZKFCRpcServer() eventually calls refreshWithLoadedConfiguration() function
below. This function directly use provider without check null and this turns
out making rpcServer above to be a null object.
In ServiceAuthorizationManager.java
{code:java}
@Private
public void refreshWithLoadedConfiguration(Configuration conf, PolicyProvider
provider) {
...
// Parse the config file
Service[] services = provider.getServices(); // <--- provider might be
null here
... {code}
h2. How to trigger this bug:
(1) Set hadoop.security.authorization to true
(2) Run test
org.apache.hadoop.ha.TestZKFailoverControllerStress#testRandomExpirations
(3) You will see the following stack trace:
{code:java}
java.lang.NullPointerException
at
org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:258)
at
org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:63)
at
org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:181)
at
org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:177)
at
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:503)
at
org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:177)
at
org.apache.hadoop.ha.MiniZKFCCluster$DummyZKFCThread.doWork(MiniZKFCCluster.java:301)
at
org.apache.hadoop.test.MultithreadedTestUtil$TestingThread.run(MultithreadedTestUtil.java:189){code}
(4) The null pointer exception here is due to the null {{rpcServer}} object
caused by the bug described above.
> Buggy ZKFCRpcServer constructor creates null object and crashes the rpcServer
> -----------------------------------------------------------------------------
>
> Key: HADOOP-18811
> URL: https://issues.apache.org/jira/browse/HADOOP-18811
> Project: Hadoop Common
> Issue Type: Bug
> Reporter: ConfX
> Priority: Critical
>
> h2. What happened:
> In ZKFailoverController.java, initRPC() function gets ZKFC RpcServer binding
> address and create a new ZKFCRpcServer object rpcServer. However rpcServer
> may be null when the ZKFCRpcServer constructor accepts a null policy provider
> and cause any later rpcServer usage a null pointer exception.
> h2. Buggy code:
> In ZKFailoverController.java
> {code:java}
> protected void initRPC() throws IOException {
> InetSocketAddress bindAddr = getRpcAddressToBindTo();
> LOG.info("ZKFC RpcServer binding to {}", bindAddr);
> rpcServer = new ZKFCRpcServer(conf, bindAddr, this, getPolicyProvider());
> // <-- Here getpolicyProvider might be null
> }
> {code}
> ZKFCRpcServer() eventually calls refreshWithLoadedConfiguration() function
> below. This function directly use provider without check null and this turns
> out making rpcServer above to be a null object.
> In ServiceAuthorizationManager.java
> {code:java}
> @Private
> public void refreshWithLoadedConfiguration(Configuration conf,
> PolicyProvider provider) {
> ...
> // Parse the config file
> Service[] services = provider.getServices(); // <--- provider might be
> null here
> ... {code}
> h2. How to trigger this bug:
> (1) Set hadoop.security.authorization to true
> (2) Run test
> org.apache.hadoop.ha.TestZKFailoverControllerStress#testRandomExpirations
> (3) You will see the following stack trace:
> {code:java}
> java.lang.NullPointerException
>
> at
> org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:258)
>
>
> at
> org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:63)
>
> at
> org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:181)
>
>
> at
> org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:177)
>
>
> at
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:503)
>
>
> at
> org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:177)
>
> at
> org.apache.hadoop.ha.MiniZKFCCluster$DummyZKFCThread.doWork(MiniZKFCCluster.java:301)
>
> at
> org.apache.hadoop.test.MultithreadedTestUtil$TestingThread.run(MultithreadedTestUtil.java:189){code}
> (4) The null pointer exception here is due to the null {{rpcServer}} object
> caused by the bug described above.
> You can use the reproduce.sh in the attachment to easily reproduce the bug:
> We are happy to provide a patch if this issue is confirmed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]