[
https://issues.apache.org/jira/browse/HADOOP-13655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15671870#comment-15671870
]
ASF GitHub Bot commented on HADOOP-13655:
-----------------------------------------
Github user liuml07 commented on a diff in the pull request:
https://github.com/apache/hadoop/pull/131#discussion_r88332309
--- Diff: hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm ---
@@ -470,6 +470,105 @@ $H3 SSL Configurations for HSFTP sources
The SSL configuration file must be in the class-path of the DistCp
program.
+$H3 DistCp and Object Stores
+
+DistCp works with Object Stores such as Amazon S3, Azure WASB and
OpenStack Swift.
+
+Prequisites
+
+1. The JAR containing the object store implementation is on the classpath,
+along with all of its dependencies.
+1. Unless the JAR automatically registers its bundled filesystem clients,
+the configuration may need to be modified to state the class which
+implements the filesystem schema. All of the ASF's own object store clients
+are self-registering.
+1. The relevant object store access credentials must be available in the
cluster
+configuration, or be otherwise available in all cluster hosts.
+
+DistCp can be used to upload data
+
+```bash
+hadoop distcp hdfs://nn1:8020/datasets/set1 s3a://bucket/datasets/set1
+```
+
+To download data
+
+```bash
+hadoop distcp s3a://bucket/generated/results hdfs://nn1:8020/results
+```
+
+To copy data between object stores
+
+```bash
+hadoop distcp s3a://bucket/generated/results \
+ wasb://[email protected]
+```
+
+And do copy data within an object store
+
+```bash
+hadoop distcp wasb://[email protected]/current \
+ wasb://[email protected]/old
+```
+
+And to use `-update` to only copy changed files.
+
+```bash
+hadoop distcp -update -numListstatusThreads 20 \
+ swift://history.cluster1/2016 \
+ hdfs://nn1:8020/history/2016
+```
+
+Because object stores are slow to list files, consider setting the
`-numListstatusThreads` option when performing a `-update` operation
+on a large directory tree (the limit is 40 threads).
+
+When `DistCp -update` is used with objec stores,
--- End diff --
objec -> object
> document object store use with fs shell and distcp
> --------------------------------------------------
>
> Key: HADOOP-13655
> URL: https://issues.apache.org/jira/browse/HADOOP-13655
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: documentation, fs, fs/s3
> Affects Versions: 2.7.3
> Reporter: Steve Loughran
> Assignee: Steve Loughran
>
> There's no specific docs for working with object stores from the {{hadoop
> fs}} shell or in distcp; people either suffer from this (performance,
> billing), or learn through trial and error what to do.
> Add a section in both fs shell and distcp docs covering use with object
> stores.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]