[
https://issues.apache.org/jira/browse/HADOOP-18458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703509#comment-17703509
]
ASF GitHub Bot commented on HADOOP-18458:
-----------------------------------------
wujinhu commented on code in PR #4912:
URL: https://github.com/apache/hadoop/pull/4912#discussion_r1144310663
##########
hadoop-tools/hadoop-aliyun/src/site/markdown/tools/hadoop-aliyun/index.md:
##########
@@ -251,9 +251,52 @@ please raise your issues with them.
</description>
</property>
+ <property>
+ <name>fs.oss.fast.upload.buffer</name>
+ <value>disk</value>
+ <description>
+ The buffering mechanism to use.
+ Values: disk, array, bytebuffer, array_disk, bytebuffer_disk.
+
+ "disk" will use the directories listed in fs.oss.buffer.dir as
+ the location(s) to save data prior to being uploaded.
+
+ "array" uses arrays in the JVM heap
+
+ "bytebuffer" uses off-heap memory within the JVM.
+
+ Both "array" and "bytebuffer" will consume memory in a single stream
up to the number
+ of blocks set by:
+
+ fs.oss.multipart.upload.size * fs.oss.upload.active.blocks.
+
+ If using either of these mechanisms, keep this value low
+
+ The total number of threads performing work across all threads is set
by
+ fs.oss.multipart.download.threads, with fs.oss.max.total.tasks values
setting the number of queued
Review Comment:
Yes, current implementation creates a thread pool for reads and uploads, but
it seems better if we separate them?
> AliyunOSS: AliyunOSSBlockOutputStream to support heap/off-heap buffer before
> uploading data to OSS
> --------------------------------------------------------------------------------------------------
>
> Key: HADOOP-18458
> URL: https://issues.apache.org/jira/browse/HADOOP-18458
> Project: Hadoop Common
> Issue Type: Improvement
> Components: fs/oss
> Affects Versions: 3.0.3, 3.1.4, 2.10.2, 3.2.4, 3.3.4
> Reporter: wujinhu
> Assignee: wujinhu
> Priority: Major
> Labels: pull-request-available
>
> Recently, our customers raise a requirement: AliyunOSSBlockOutputStream
> should support heap/off-heap buffer before uploading data to OSS.
> Currently, AliyunOSSBlockOutputStream buffers data in local directory before
> uploading to OSS, it is not efficient compared to memory.
> Changes:
> # Adds heap/off-heap buffers
> # Adds limitation of memory used, and fallback to disk
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]