Arun, I just verified that I get the same error with 2.0.0-alpha (official tarball) and 2.0.1-alpha (built from svn).
Karthik, thanks for forwarding. Thanks, Trevor On Tue, Jul 17, 2012 at 6:18 PM, Karthik Kambatla <[email protected]>wrote: > Forwarding your email to the cdh-user group. > > Thanks > Karthik > > > On Tue, Jul 17, 2012 at 2:24 PM, Trevor <[email protected]> wrote: > >> Hi all, >> >> I recently upgraded from CDH4b2 (0.23.1) to CDH4 (2.0.0). Now for some >> strange reason, my MRv2 jobs (TeraGen, specifically) fail if I run with >> more than one slave. For every slave except the one running the Application >> Master, I get the following failed tasks and warnings repeatedly: >> >> 12/07/13 14:21:55 INFO mapreduce.Job: Running job: job_1342207265272_0001 >> 12/07/13 14:22:17 INFO mapreduce.Job: Job job_1342207265272_0001 running >> in uber mode : false >> 12/07/13 14:22:17 INFO mapreduce.Job: map 0% reduce 0% >> 12/07/13 14:22:46 INFO mapreduce.Job: map 1% reduce 0% >> 12/07/13 14:22:52 INFO mapreduce.Job: map 2% reduce 0% >> 12/07/13 14:22:55 INFO mapreduce.Job: map 3% reduce 0% >> 12/07/13 14:22:58 INFO mapreduce.Job: map 4% reduce 0% >> 12/07/13 14:23:04 INFO mapreduce.Job: map 5% reduce 0% >> 12/07/13 14:23:07 INFO mapreduce.Job: map 6% reduce 0% >> 12/07/13 14:23:07 INFO mapreduce.Job: Task Id : >> attempt_1342207265272_0001_m_000004_0, Status : FAILED >> 12/07/13 14:23:08 WARN mapreduce.Job: Error reading task output Server >> returned HTTP response code: 400 for URL: http:// >> >> perfgb0n0:8080/tasklog?plaintext=true&attemptid=attempt_1342207265272_0001_m_000004_0&filter=stdout >> 12/07/13 14:23:08 WARN mapreduce.Job: Error reading task output Server >> returned HTTP response code: 400 for URL: http:// >> >> perfgb0n0:8080/tasklog?plaintext=true&attemptid=attempt_1342207265272_0001_m_000004_0&filter=stderr >> 12/07/13 14:23:08 INFO mapreduce.Job: Task Id : >> attempt_1342207265272_0001_m_000003_0, Status : FAILED >> 12/07/13 14:23:08 WARN mapreduce.Job: Error reading task output Server >> returned HTTP response code: 400 for URL: http:// >> >> perfgb0n0:8080/tasklog?plaintext=true&attemptid=attempt_1342207265272_0001_m_000003_0&filter=stdout >> ... >> 12/07/13 14:25:12 INFO mapreduce.Job: map 25% reduce 0% >> 12/07/13 14:25:12 INFO mapreduce.Job: Job job_1342207265272_0001 failed >> with state FAILED due to: >> ... >> Failed map tasks=19 >> Launched map tasks=31 >> >> The HTTP 400 error appears to be generated by the ShuffleHandler, which >> is configured to run on port 8080 of the slaves, and doesn't understand >> that URL. What I've been able to piece together so far is that /tasklog is >> handled by the TaskLogServlet, which is part of the TaskTracker. However, >> isn't this an MRv1 class that shouldn't even be running in my >> configuration? Also, the TaskTracker appears to run on port 50060, so I >> don't know where port 8080 is coming from. >> >> Though it could be a red herring, this warning seems to be related to the >> job failing, despite the fact that the job makes progress on the slave >> running the AM. The Node Manager logs on both AM and non-AM slaves appear >> fairly similar, and I don't see any errors in the non-AM logs. >> >> Another strange data point: These failures occur running the slaves on >> ARM systems. Running the slaves on x86 with the same configuration works. >> I'm using the same tarball on both, which means that the native-hadoop >> library isn't loaded on ARM. The master/client is the same x86 system in >> both scenarios. All nodes are running Ubuntu 12.04. >> >> Thanks for any guidance, >> Trevor >> >> >
