[
https://issues.apache.org/jira/browse/HADOOP-13560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15575244#comment-15575244
]
ASF GitHub Bot commented on HADOOP-13560:
-----------------------------------------
Github user thodemoor commented on a diff in the pull request:
https://github.com/apache/hadoop/pull/130#discussion_r83415072
--- Diff:
hadoop-common-project/hadoop-common/src/main/resources/core-default.xml ---
@@ -1095,10 +1102,50 @@
<property>
<name>fs.s3a.fast.upload</name>
<value>false</value>
- <description>Upload directly from memory instead of buffering to
- disk first. Memory usage and parallelism can be controlled as up to
- fs.s3a.multipart.size memory is consumed for each (part)upload actively
- uploading (fs.s3a.threads.max) or queueing
(fs.s3a.max.total.tasks)</description>
+ <description>
+ Use the incremental block-based fast upload mechanism with
+ the buffering mechanism set in fs.s3a.fast.upload.buffer.
+ </description>
+</property>
+
+<property>
+ <name>fs.s3a.fast.upload.buffer</name>
+ <value>disk</value>
+ <description>
+ The buffering mechanism to use when using S3A fast upload
+ (fs.s3a.fast.upload=true). Values: disk, array, bytebuffer.
+ This configuration option has no effect if fs.s3a.fast.upload is false.
+
+ "disk" will use the directories listed in fs.s3a.buffer.dir as
+ the location(s) to save data prior to being uploaded.
+
+ "array" uses arrays in the JVM heap
+
+ "bytebuffer" uses off-heap memory within the JVM.
+
+ Both "array" and "bytebuffer" will consume memory in a single stream
up to the number
+ of blocks set by:
+
+ fs.s3a.multipart.size * fs.s3a.fast.upload.active.blocks.
+
+ If using either of these mechanisms, keep this value low
+
+ The total number of threads performing work across all threads is set
by
+ fs.s3a.threads.max, with fs.s3a.max.total.tasks values setting the
number of queued
+ work items.
--- End diff --
Completely agree. A bit further down I propose to add a single explanation
in the javadoc and link to there in the various other locations
> S3ABlockOutputStream to support huge (many GB) file writes
> ----------------------------------------------------------
>
> Key: HADOOP-13560
> URL: https://issues.apache.org/jira/browse/HADOOP-13560
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 2.9.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Attachments: HADOOP-13560-branch-2-001.patch,
> HADOOP-13560-branch-2-002.patch, HADOOP-13560-branch-2-003.patch,
> HADOOP-13560-branch-2-004.patch
>
>
> An AWS SDK [issue|https://github.com/aws/aws-sdk-java/issues/367] highlights
> that metadata isn't copied on large copies.
> 1. Add a test to do that large copy/rname and verify that the copy really
> works
> 2. Verify that metadata makes it over.
> Verifying large file rename is important on its own, as it is needed for very
> large commit operations for committers using rename
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]