My mapper is perl script  and it is not in Java.So how do I specify the 
NLineFormat?




________________________________
From: Robert Evans <[email protected]>
To: "[email protected]" <[email protected]>; 
"[email protected]" <[email protected]>
Sent: Thu, August 2, 2012 12:59:50 PM
Subject: Re: Issue with Hadoop Streaming

It depends on the input format you use.  You probably want to look at using 
NLineInputFormat

From: Devi Kumarappan <[email protected]<mailto:[email protected]>>
Reply-To: 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Wednesday, August 1, 2012 8:09 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Issue with Hadoop Streaming

I am trying to run hadoop streaming using perl script as the mapper and with no 
reducer. My requirement is for the Mapper  to run on one file at a time.  since 
I have to do pattern processing in the entire contents of one file at a time 
and 
the file size is small.

Hadoop streaming manual suggests the following solution

*  Generate a file containing the full HDFS path of the input files. Each map 
task would get one file name as input.
*  Create a mapper script which, given a filename, will get the file to local 
disk, gzip the file and put it back in the desired output directory.

I am running the fllowing command.

hadoop jar 
/usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u3.jar 
-input 
/user/devi/file.txt -output /user/devi/s_output -mapper "/usr/bin/perl 
/home/devi/Perl/crash_parser.pl"



/user/devi/file.txt contains the following two lines.

/user/devi/s_input/a.txt
/user/devi/s_input/b.txt

When this runs, instead of spawing two mappers for a.txt and b.txt as per the 
document, only one mapper is being spawned and the perl script gets the 
/user/devi/s_input/a.txt and /user/devi/s_input/b.txt as the inputs.



How could I make the mapper perl script to run using only one file at a time ?



Appreciate your help, Thanks, Devi

Reply via email to