Can't we use getmerge here ?  If you requirement is to merge some files in a 
particular directory to single file .. 

hadoop fs -getmerge <dir_of_input_files> <mergedsinglefile>

--Senthil
-----Original Message-----
From: Giovanni Mascari [mailto:[email protected]] 
Sent: Thursday, November 03, 2016 7:24 PM
To: Piyush Mukati <[email protected]>; [email protected]
Subject: Re: merging small files in HDFS

Hi,
if I correctly understand your request you need only to merge some data 
resulting from an hdfs write operation.
In this case, I suppose that your best option is to use hadoop-stream with 
'cat' command.

take a look here:
https://hadoop.apache.org/docs/r1.2.1/streaming.html

Regards

Il 03/11/2016 13:53, Piyush Mukati ha scritto:
> Hi,
> I want to merge multiple files in one HDFS dir to one file. I am 
> planning to write a map only job using input format which will create 
> only one inputSplit per dir.
> this way my job don't need to do any shuffle/sort.(only read and write 
> back to disk) Is there any such file format already implemented ?
> Or any there better solution for the problem.
>
> thanks.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to