Re: Issue with Hadoop Streaming

Devi Kumarappan Thu, 02 Aug 2012 16:08:23 -0700


After specifying NLineInputFormat option, streaming job fails with


Error from attempt_201205171448_0092_m_000000_0: java.lang.RuntimeException: 
PipeMapRed.waitOutputThreads(): subprocess failed with code 2

It spawns two mappers, but i am not sure whether the mapper runs with file 
names 
specified in the input option.  I was expecting one mapper to run with 
/user/devi/s_input/a.txt and one mapper to run with /user/devi/s_input/b.txt. I 
digged into the task files, but could not find anything.

Here is the simple  mapper perl script .All does is it reads the file and 
prints 
it. (It needs to do much more stuff, but I could not get the basic job itself 
to 
run).

 $i = 0;
   $userinput = <STDIN>;
   open(INFILE,"$userinput") || die "could not open the file $userinput \n";
   while (<INFILE>) {
     my $line = $_;
     print "$i".$line ;
     $i++;
   }
   close(INFILE);
exit;

My command is hadoop jar 
/usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u3.jar 
-input 
/user/devi/file.txt -output /user/devi/s_output -mapper "/usr/bin/perl 
/home/devi/Perl/crash_parser.pl" -inputformat 
org.apache.hadoop.mapred.lib.NLineInputFormat 


Really appreciate your help.

Devi


 



________________________________
From: Robert Evans <[email protected]>
To: "[email protected]" <[email protected]>; 
"[email protected]" <[email protected]>
Sent: Thu, August 2, 2012 1:16:54 PM
Subject: Re: Issue with Hadoop Streaming


http://www.mail-archive.com/[email protected]/msg07382.html



From: Devi Kumarappan <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Thursday, August 2, 2012 3:03 PM
To: "[email protected]" <[email protected]>, 
"[email protected]" <[email protected]>
Subject: Re: Issue with Hadoop Streaming


My mapper is perl script  and it is not in Java.So how do I specify the 
NLineFormat?




________________________________
From: Robert Evans <[email protected]>
To: "[email protected]" <[email protected]>; 
"[email protected]" <[email protected]>
Sent: Thu, August 2, 2012 12:59:50 PM
Subject: Re: Issue with Hadoop Streaming

It depends on the input format you use.  You probably want to look at using 
NLineInputFormat

From: Devi Kumarappan <[email protected]<mailto:[email protected]>>
Reply-To: 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Wednesday, August 1, 2012 8:09 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Issue with Hadoop Streaming

I am trying to run hadoop streaming using perl script as the mapper and with no 
reducer. My requirement is for the Mapper  to run on one file at a time.  since 
I have to do pattern processing in the entire contents of one file at a time 
and 
the file size is small.

Hadoop streaming manual suggests the following solution

*  Generate a file containing the full HDFS path of the input files. Each map 
task would get one file name as input.
*  Create a mapper script which, given a filename, will get the file to local 
disk, gzip the file and put it back in the desired output directory.

I am running the fllowing command.

hadoop jar 
/usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u3.jar 
-input 
/user/devi/file.txt -output /user/devi/s_output -mapper "/usr/bin/perl 
/home/devi/Perl/crash_parser.pl"



/user/devi/file.txt contains the following two lines.

/user/devi/s_input/a.txt
/user/devi/s_input/b.txt

When this runs, instead of spawing two mappers for a.txt and b.txt as per the 
document, only one mapper is being spawned and the perl script gets the 
/user/devi/s_input/a.txt and /user/devi/s_input/b.txt as the inputs.



How could I make the mapper perl script to run using only one file at a time ?



Appreciate your help, Thanks, Devi

Re: Issue with Hadoop Streaming

Reply via email to