[jira] [Commented] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS

Genmao Yu (JIRA) Mon, 13 Nov 2017 21:56:41 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16250920#comment-16250920
 ]


Genmao Yu commented on HADOOP-15027:
------------------------------------

How to optimize random IO is a major piece of work.  IIUC, [[email protected]] 
you said you would focus on performance of column format file, i.e. random IO. 
Is there any jira to trace?
[~wujinhu] Let us focus on how to improve sequential IO, but not random IO. 
IMHO, {{SemaphoredDelegatingExecutor}} is a good common class, we may move this 
class to hadoop-common. I will open a jira to do this work if you guys do not 
mind. So this jira will be pending for a while. Besides, could you please post 
a more detailed performance test result?


> Improvements for Hadoop read from AliyunOSS
> -------------------------------------------
>
>                 Key: HADOOP-15027
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15027
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/oss
>    Affects Versions: 3.0.0
>            Reporter: wujinhu
>            Assignee: wujinhu
>         Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch
>
>
> Currently, read performance is poor when Hadoop reads from AliyunOSS. It 
> needs about 1min to read 1GB from OSS.
> Class AliyunOSSInputStream uses single thread to read data from AliyunOSS,  
> so we can refactor this by using multi-thread pre read to improve this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS

Reply via email to