Hey

I am running Nutch processes on Hadoop. The fetcher.parse property is set TRUE. While the job is running Map spills are created in the directory : /home/hadoop/nodelogs/usercache/root/appcache.


The spills are created during the Map JOB of fetch phase. The file size created amounts upto 17 gigs of data and occupies over 90% of datanode disk space. The state of the datanode changes to UNHEALTHY after this. Therefore, I need to delete the logs created periodically so as the process keeps running smoothly but sometimes it hinders with the process and tends to increase the job completion time. I have set logging of only ERROR messages or above in mapred-site.xml. I have changed the mapred.userlog.limit.kb to 10240.

Please provide your suggestions such that this can be avoided and lead to the proper functioning of NUTCH.

--

Shubham Gupta


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to