[ 
https://issues.apache.org/jira/browse/HADOOP-11262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pieter Reuse updated HADOOP-11262:
----------------------------------
    Attachment: HADOOP-11262-9.patch

Thank you for the +1's, [~mackrorysd], [~eddyxu] and [[email protected]].
[~cnauroth], thank you for looking at this patch.

I have added @Ignore annotations where appropriate in version 9. I also removed 
the copyright lines featuring the year.

Regarding the modification-times of directories in S3A: as directories are 
"fakes" in s3a, there is no feasible way to get accurate directory timestamps 
without extensive locking (and coping with slow listings), which counters the 
rationale of object stores. Therefore, we chose a "dummy" implementation that 
doesn't break (too many) things.

Setting a fixed time (e.g. epoch) breaks the history server as it looks at the 
modification time of the directory-object before moving it, and decides the 
files don't need to be copied if they are "too old". Setting the 
modificationtime of directories in S3A to System.currentTimeMillis() ensures 
the historyserver never labels them as being "too old".

Good that you have taken a deeper look into whether always labelling 
directories as "too young" can give rise to problems in YARN. Looking deeper 
into the classes LocalResource and LocalResourceType learns that the YARN 
resource localization is always executed against regular files or .jar-archives 
(these are the only possible values of LocalResourceType), for which S3A 
returns the correct timestamps.

However, looking at the AggregatedLogDeletionService of YARN learns that this 
service will omit removing the appropriate logfiles because the directory will 
be labelled "too young". I did not find any other places in YARN where this 
patch can cause problems. I indicated this behaviour in the index.md file in 
the patch. As this is no breaking situation, I still propose to go forward with 
this patch.

> Enable YARN to use S3A 
> -----------------------
>
>                 Key: HADOOP-11262
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11262
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Thomas Demoor
>            Assignee: Pieter Reuse
>              Labels: amazon, s3
>         Attachments: HADOOP-11262-2.patch, HADOOP-11262-3.patch, 
> HADOOP-11262-4.patch, HADOOP-11262-5.patch, HADOOP-11262-6.patch, 
> HADOOP-11262-7.patch, HADOOP-11262-8.patch, HADOOP-11262-9.patch, 
> HADOOP-11262.patch
>
>
> Uses DelegateToFileSystem to expose S3A as an AbstractFileSystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to