Did you change java idk version as well, as part of the upgrade? Dheeren > On Aug 16, 2016, at 11:59 AM, Chris Nauroth <[email protected]> wrote: > > Hello Sebastian, > > This is an interesting finding. Thank you for reporting it. > > Are you able to share a bit more about your deployment architecture? Are > these EC2 VMs? If so, are they co-located in the same AWS region as the S3 > bucket? If the cluster is not running in EC2 (e.g. on-premises physical > hardware), then are there any notable differences on nodes that experienced > this problem (e.g. smaller capacity on the outbound NIC)? > > This is just a theory, but If your bandwidth to the S3 service is > intermittently saturated or throttled or somehow compromised, then I could > see how longer timeouts and more retries might increase overall job time. > With the shorter settings, it might cause individual task attempts to fail > sooner. Then, if the next attempt gets scheduled to a different node with > better bandwidth to S3, it would start making progress faster in the second > attempt. Then, the effect on overall job execution might be faster. > > --Chris Nauroth > > On 8/7/16, 12:12 PM, "Sebastian Nagel" <[email protected]> wrote: > > Hi, > > recently, after upgrading to CDH 5.8.0, I've run into a performance > issue when reading data from AWS S3 (via s3a). > > A job [1] reads 10,000s files ("objects") from S3 and writes extracted > data back to S3. Every file/object is about 1 GB in size, processing > is CPU-intensive and takes a couple of minutes per file/object. Each > file/object is processed by one task using FilenameInputFormat. > > After the upgrade to CDH 5.8.0, the job showed slow progress, 5-6 > times slower in overall than in previous runs. A significant number > of tasks hung up without progress for up to one hour. These tasks were > dominating and most nodes in the cluster showed little or no CPU > utilization. Tasks are not killed/restarted because the task timeout > is set to a very large value (because S3 is known to be slow > sometimes). Attaching to a couple of the hung tasks with jstack > showed that these tasks hang when reading from S3 [3]. > > The problem was finally fixed by setting > fs.s3a.connection.timeout = 30000 (default: 200000 ms) > fs.s3a.attempts.maximum = 5 (default 20) > Tasks now take 20min. in the worst case, the majority finishes within > minutes. > > Is this the correct way to fix the problem? > These settings have been increased recently in HADOOP-12346 [2]. > What could be the draw-backs with a lower timeout? > > Thanks, > Sebastian > > [1] > > https://github.com/commoncrawl/ia-hadoop-tools/blob/master/src/main/java/org/archive/hadoop/jobs/WEATGenerator.java > > [2] https://issues.apache.org/jira/browse/HADOOP-12346 > > [3] "main" prio=10 tid=0x00007fad64013000 nid=0x4ab5 runnable > [0x00007fad6b274000] > java.lang.Thread.State: RUNNABLE > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:152) > at java.net.SocketInputStream.read(SocketInputStream.java:122) > at > > com.cloudera.org.apache.http.impl.io.AbstractSessionInputBuffer.read(AbstractSessionInputBuffer.java:204) > at > > com.cloudera.org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:182) > at > com.cloudera.org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:138) > at > com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72) > at > com.cloudera.com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:151) > at > com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72) > at > com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72) > at > com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72) > at > com.cloudera.com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:151) > at > com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72) > at > com.cloudera.com.amazonaws.util.LengthCheckInputStream.read(LengthCheckInputStream.java:108) > at > com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72) > at > org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:160) > - locked <0x00000007765604f8> (a > org.apache.hadoop.fs.s3a.S3AInputStream) > at java.io.DataInputStream.read(DataInputStream.java:149) > ... > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected]
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
