Hey
I am running Nutch processes on Hadoop. The fetcher.parse property is
set TRUE. While the job is running Map spills are created in the
directory : /home/hadoop/nodelogs/usercache/root/appcache.
The spills are created during the Map JOB of fetch phase. The file size
created amounts upto 17 gigs of data and occupies over 90% of datanode
disk space. The state of the datanode changes to UNHEALTHY after this.
Therefore, I need to delete the logs created periodically so as the
process keeps running smoothly but sometimes it hinders with the process
and tends to increase the job completion time.
I have set logging of only ERROR messages or above in mapred-site.xml. I
have changed the mapred.userlog.limit.kb to 10240.
Please provide your suggestions such that this can be avoided and lead
to the proper functioning of NUTCH.
--
Shubham Gupta
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]