[jira] [Commented] (HADOOP-13695) S3A to use a thread pool for async path operations

Thomas Demoor (JIRA) Tue, 25 Oct 2016 05:24:13 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-13695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15605184#comment-15605184
 ]


Thomas Demoor commented on HADOOP-13695:
----------------------------------------

We really like the idea of the separate threadpools per operation type.

I see multiple types of operations:
* small payload: HEAD, DELETE
* potentially big payload + O(objectsize) duration: GET, PUT
* small payload + O(objectsize) duration: PUT-COPY
* moderate payload + O(listsize) duration: LIST, MULTIDELETE

[[email protected]], I agree, finding a way to parallelize all renames 
(PUT-COPY) would achieve some of the goals we had for HADOOP-9565: close to 2x 
speedup on FileOutputCommiter.commitJob() and distCp. Related to that, we've 
also been brainstorming about an "object storage friendly" FileOutputCommitter 
based on S3 versioning. What is your thinking here?

Also, HADOOP-13600 and HADOOP-13407 should be linked tickets imho.

> S3A to use a thread pool for async path operations
> --------------------------------------------------
>
>                 Key: HADOOP-13695
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13695
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>
> S3A path operations are often slow due to directory scanning, mock directory 
> create/delete, etc. Many of these can be done asynchronously
> * because deletion is eventually consistent, deleting parent dirs after an 
> operation has returned doesn't alter the behaviour, except in the special 
> case of : operation failure.
> * scanning for paths/parents of a file in the create operation only needs to 
> complete before the close() operation instantiates the object, no need to 
> block create().
> * parallelized COPY calls would permit asynchronous rename.
> We could either use the thread pool used for block writes, or somehow isolate 
> low cost path ops (GET, DELETE) from the more expensive calls (COPY, PUT) so 
> that a thread doing basic IO doesn't block for the duration of the long op. 
> Maybe also use {{Semaphore.tryAcquire()}} and only start async work if there 
> actually is an idle thread, doing it synchronously if not. Maybe it depends 
> on the operation. path query/cleanup before/after a write is something which 
> could be scheduled as just more futures to schedule in the block write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-13695) S3A to use a thread pool for async path operations

Reply via email to