[jira] [Work logged] (HADOOP-17338) Intermittent S3AInputStream failures: Premature end of Content-Length delimited message body etc

ASF GitHub Bot (Jira) Mon, 30 Nov 2020 04:03:35 -0800


     [ 
https://issues.apache.org/jira/browse/HADOOP-17338?focusedWorklogId=517877&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-517877
 ]


ASF GitHub Bot logged work on HADOOP-17338:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 30/Nov/20 12:02
            Start Date: 30/Nov/20 12:02
    Worklog Time Spent: 10m 
      Work Description: steveloughran commented on pull request #2455:
URL: https://github.com/apache/hadoop/pull/2455#issuecomment-735744062


   >  I wonder if the hadoop jenkins test can be set up to do the s3a test 
automatically like other tests. 
   
   1. we can't give it credentials for security reasons -even if we only issued 
short-lived session credentials, getting them would be as trivial as submitting 
a PR which printed them. Same for abfs
   2, if someone isn't set up to run the tests, they aren't set up to deal with 
regressions or debug why their own patch doesn't work. 
   3. There's an extra benefit -because everyone's config is slightly different 
(network, endpoints, encryption, etc) we get better coverage of test 
configurations by having different people run the tests. It's not unusual for a 
patch to get merged in but which a few days later needs a followup as someone 
else finds a regression in their test setup.
   
   I would like more test runs, e.g the daily jenkins runs, to at least have 
credentials, but I've yet to come up with a good design for secure execution. 
It'd need something like
   * isolated AWS account (billed to who?)
   * two IAM roles: #1: with limited access to a single s3 bucket, #2: with the 
permission to call assumeRole on role #1
   * something on build setup to call assumeRole at start of run and issue role 
credentials valid for a few hours max
   * the jenkins scripts would only get those role credentials
   That still leaves with the "what to do at the end of the run" problem. 
Maybe: revoke all sessions under a specific role through the relevant IAM API 
call -this might work if role#2 has the permissions and you only ever have one 
active session in role #1, because we'd have to revoke all sessions in that 
role.
   
   See: not easy. 
   
   Put your error stack traces into the PR. A single test failure isn't enough 
to block a patch if we can identify a cause and say "this is independent". 
Given you are seeing things I'm not, that's something we need to understand.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 517877)
    Time Spent: 2h 10m  (was: 2h)

> Intermittent S3AInputStream failures: Premature end of Content-Length 
> delimited message body etc
> ------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-17338
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17338
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 3.3.0
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HADOOP-17338.001.patch
>
>          Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> We are seeing the following two kinds of intermittent exceptions when using 
> S3AInputSteam:
> 1.
> {code:java}
> Caused by: com.amazonaws.thirdparty.apache.http.ConnectionClosedException: 
> Premature end of Content-Length delimited message body (expected: 156463674; 
> received: 150001089
> at 
> com.amazonaws.thirdparty.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:178)
> at 
> com.amazonaws.thirdparty.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:135)
> at 
> com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82)
> at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180)
> at 
> com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82)
> at 
> com.amazonaws.services.s3.internal.S3AbortableInputStream.read(S3AbortableInputStream.java:125)
> at 
> com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82)
> at 
> com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82)
> at 
> com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82)
> at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180)
> at 
> com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82)
> at 
> com.amazonaws.util.LengthCheckInputStream.read(LengthCheckInputStream.java:107)
> at 
> com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82)
> at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:181)
> at java.io.DataInputStream.readFully(DataInputStream.java:195)
> at java.io.DataInputStream.readFully(DataInputStream.java:169)
> at 
> org.apache.parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:779)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:511)
> at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:130)
> at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:214)
> at 
> org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:227)
> at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next(ParquetRecordReaderWrapper.java:208)
> at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next(ParquetRecordReaderWrapper.java:63)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
> ... 15 more
> {code}
> 2.
> {code:java}
> Caused by: javax.net.ssl.SSLException: SSL peer shut down incorrectly
> at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:596)
> at sun.security.ssl.InputRecord.read(InputRecord.java:532)
> at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:990)
> at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:948)
> at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
> at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
> at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.read(SessionInputBufferImpl.java:198)
> at 
> com.amazonaws.thirdparty.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:176)
> at 
> com.amazonaws.thirdparty.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:135)
> at 
> com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82)
> at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180)
> at 
> com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82)
> at 
> com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82)
> at 
> com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82)
> at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180)
> at 
> com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82)
> at 
> com.amazonaws.util.LengthCheckInputStream.read(LengthCheckInputStream.java:107)
> at 
> com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82)
> at 
> com.amazonaws.services.s3.internal.S3AbortableInputStream.read(S3AbortableInputStream.java:125)
> at 
> com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82)
> at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:181)
> at java.io.DataInputStream.readFully(DataInputStream.java:195)
> at 
> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:70)
> at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:120)
> at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2361)
> at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2493)
> at 
> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:82)
> at 
> cascading.tap.hadoop.io.CombineFileRecordReaderWrapper.next(CombineFileRecordReaderWrapper.java:70)
> at 
> org.apache.hadoop.mapred.lib.CombineFileRecordReader.next(CombineFileRecordReader.java:58)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
> at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
> ... 10 more
> {code}
> Inspired by
>  
> [https://stackoverflow.com/questions/9952815/s3-java-client-fails-a-lot-with-premature-end-of-content-length-delimited-messa]
>  and
>  [https://forums.aws.amazon.com/thread.jspa?threadID=83326], we got a 
> solution that has helped us, would like to put the fix to the community 
> version.
> The problem is that S3AInputStream had a short-lived S3Object which is used 
> to create the wrappedSteam, and this object got garbage collected at random 
> time, which caused the stream to be closed, thus the symptoms reported.
> [https://github.com/aws/aws-sdk-java/blob/1.11.295/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/model/S3Object.java#L225]
>  is the s3 code that closes the stream when S3 object is garbage collected:
> Here is the code in S3AInputStream that creates temporary S3Object and uses 
> it to create the wrappedStream:
> {code:java}
>    S3Object object = Invoker.once(text, uri,
>         () -> client.getObject(request));
>     changeTracker.processResponse(object, operation,
>         targetPos);
>     wrappedStream = object.getObjectContent();
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Work logged] (HADOOP-17338) Intermittent S3AInputStream failures: Premature end of Content-Length delimited message body etc

Reply via email to