[
https://issues.apache.org/jira/browse/HADOOP-17812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17386993#comment-17386993
]
Bobby Wang commented on HADOOP-17812:
-------------------------------------
Hi [[email protected]]
Thx for your comments on the PR and the JIRA, I modified the unit test to
include the repro step described in this JIRA, and the unit tests all passed.
I followed the step to run the Integration Test on our internal s3 storage and
seems some tests failed. And I was told some failed tests were expected, So I
just uploaded the failsafe-report.html in the attachment named with
s3a-test.tar.gz, Could you help to check if the failed test cases are expected?
BTW, I just configured the below items in the auth-keys.xml,
{quote}<configuration>
<property>
<name>test.fs.s3a.name</name>
<value>s3a://testawss3a/</value>
</property>
<property>
<name>fs.contract.test.fs.s3a</name>
<value>${test.fs.s3a.name}</value>
</property>
<property>
<name>fs.s3a.access.key</name>
<value>XXX</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>XXXXXX</value>
</property>
<property>
<name>fs.s3a.endpoint</name>
<value>XXXXX</value>
</property>
<property>
<name>fs.s3a.path.style.access</name>
<value>true</value>
</property>
</configuration>
{quote}
> NPE in S3AInputStream read() after failure to reconnect to store
> ----------------------------------------------------------------
>
> Key: HADOOP-17812
> URL: https://issues.apache.org/jira/browse/HADOOP-17812
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 3.2.2, 3.3.1
> Reporter: Bobby Wang
> Priority: Major
> Labels: pull-request-available
> Attachments: s3a-test.tar.gz
>
> Time Spent: 1h 50m
> Remaining Estimate: 0h
>
> when [reading from S3a
> storage|https://github.com/apache/hadoop/blob/rel/release-3.2.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L450],
> SSLException (which extends IOException) happens, which will trigger
> [onReadFailure|https://github.com/apache/hadoop/blob/rel/release-3.2.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L458].
> onReadFailure calls "reopen". it will first close the original
> *wrappedStream* and set *wrappedStream = null*, and then it will try to
> [re-get
> *wrappedStream*|https://github.com/apache/hadoop/blob/rel/release-3.2.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L184].
> But what if the previous code [obtaining
> S3Object|https://github.com/apache/hadoop/blob/rel/release-3.2.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L183]
> throw exception, then "wrappedStream" will be null.
> And the
> [retry|https://github.com/apache/hadoop/blob/rel/release-3.2.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L446]
> mechanism may re-execute the
> [wrappedStream.read|https://github.com/apache/hadoop/blob/rel/release-3.2.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L450]
> and cause NPE.
>
> For more details, please refer to
> [https://github.com/NVIDIA/spark-rapids/issues/2915]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]