Zheng Shao created HADOOP-14137:
-----------------------------------
Summary: Allow DistCp to take a file list within a src directory
Key: HADOOP-14137
URL: https://issues.apache.org/jira/browse/HADOOP-14137
Project: Hadoop Common
Issue Type: New Feature
Components: tools/distcp
Reporter: Zheng Shao
DistCp is very slow to start when the src directory has a huge number of
subdirectories. In our case, we already have the directory listing (via "hdfs
oiv -i fsimage" or via nightly "hdfs dfs -lr -r /" dumps), and we would like to
use that instead of doing realtime listing on the NameNode.
The "-f" option doesn't help in this case because it would try to put
everything into a single flat target directory.
We'd like to introduce a new option "-list <file>" for distcp. The <file>
contains the result of listing the src directory.
In order to achieve this, we plan to:
1. Add a new CopyListing class PregeneratedCopyListing similar to
SimpleCopyListing which doesn't "-ls -r" into the directory, but takes the
listing via "-list"
2. Add an option "-list <file>" which will automatically make distcp use the
new PregeneratedCopyListing class.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]