Can't we use getmerge here ? If you requirement is to merge some files in a particular directory to single file ..
hadoop fs -getmerge <dir_of_input_files> <mergedsinglefile> --Senthil -----Original Message----- From: Giovanni Mascari [mailto:[email protected]] Sent: Thursday, November 03, 2016 7:24 PM To: Piyush Mukati <[email protected]>; [email protected] Subject: Re: merging small files in HDFS Hi, if I correctly understand your request you need only to merge some data resulting from an hdfs write operation. In this case, I suppose that your best option is to use hadoop-stream with 'cat' command. take a look here: https://hadoop.apache.org/docs/r1.2.1/streaming.html Regards Il 03/11/2016 13:53, Piyush Mukati ha scritto: > Hi, > I want to merge multiple files in one HDFS dir to one file. I am > planning to write a map only job using input format which will create > only one inputSplit per dir. > this way my job don't need to do any shuffle/sort.(only read and write > back to disk) Is there any such file format already implemented ? > Or any there better solution for the problem. > > thanks. > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
