I/O errors reading task output on 20.205.0

Markus Jelsma Mon, 26 Dec 2011 16:48:17 -0800

Hi,

We just ran run large scale Apache Nutch jobs in our evaluation of 20.205.0 
and they all failed. Some of these jobs ran concurrently with the fair 
scheduler enabled. These were simple jobs consuming little RAM. I double 
checked and there were certainly no RAM issues.


All jobs failed and most tasks had a less than descriptive message. A few told 
they dealt with I/O errors reading task output. However, the data the read is 
fine. When we ran the same jobs manually (and some concurrently) some did fine 
and others died for with I/O errors reading task output again!

The heap allocation for the reducers is not high but no OOM's were reported. 
Besides the occasional I/O error, which i think is strange enough, most tasks 
did not write anything to the logs that i can link to this problem.

We do not see this happening on our 20.203.0 cluster although resources and 
settings are different. 205 is a new high-end cluster with similar 
conservative settings but only more mappers/reducers per node. Resource 
settings are almost identical. The 203 cluster has three times as many 
machines so also more open file descriptors and threads.

Any thoughts to share?
Thanks,

I/O errors reading task output on 20.205.0

Reply via email to