[ 
https://issues.apache.org/jira/browse/HADOOP-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108408#comment-13108408
 ] 

Harsh J commented on HADOOP-2120:
---------------------------------

I believe the sorting earlier referred to the file list sorting?

In that case, although FSNamesystem gives consistent sorting for HDFS's 
listStatus and such, note that Java's File APIs do not provide the same 
consistency while using getmerge over any LocalFileSystem. I've opened 
HADOOP-7659 for this, btw.

> dfs -getMerge does not do what it says it does
> ----------------------------------------------
>
>                 Key: HADOOP-2120
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2120
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: documentation, fs
>    Affects Versions: 0.14.3
>         Environment: All
>            Reporter: Milind Bhandarkar
>              Labels: newbie
>
> dfs -getMerge, which calls FileUtil.CopyMerge, contains this javadoc:
> {code}
> Get all the files in the directories that match the source file pattern
>    * and merge and sort them to only one file on local fs 
>    * srcf is kept.
> {code}
> However, it only concatenates the set of input files, rather than merging 
> them in sorted order.
> Ideally, the copyMerge should be equivalent to a map-reduce job with 
> IdentityMapper and IdentityReducer with numReducers = 1. However, not having 
> to run this as a map-reduce job has some advantages, since it increases 
> cluster utilization during reduce phase.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to