[
https://issues.apache.org/jira/browse/HADOOP-11262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pieter Reuse updated HADOOP-11262:
----------------------------------
Attachment: HADOOP-11262-9.patch
Thank you for the +1's, [~mackrorysd], [~eddyxu] and [[email protected]].
[~cnauroth], thank you for looking at this patch.
I have added @Ignore annotations where appropriate in version 9. I also removed
the copyright lines featuring the year.
Regarding the modification-times of directories in S3A: as directories are
"fakes" in s3a, there is no feasible way to get accurate directory timestamps
without extensive locking (and coping with slow listings), which counters the
rationale of object stores. Therefore, we chose a "dummy" implementation that
doesn't break (too many) things.
Setting a fixed time (e.g. epoch) breaks the history server as it looks at the
modification time of the directory-object before moving it, and decides the
files don't need to be copied if they are "too old". Setting the
modificationtime of directories in S3A to System.currentTimeMillis() ensures
the historyserver never labels them as being "too old".
Good that you have taken a deeper look into whether always labelling
directories as "too young" can give rise to problems in YARN. Looking deeper
into the classes LocalResource and LocalResourceType learns that the YARN
resource localization is always executed against regular files or .jar-archives
(these are the only possible values of LocalResourceType), for which S3A
returns the correct timestamps.
However, looking at the AggregatedLogDeletionService of YARN learns that this
service will omit removing the appropriate logfiles because the directory will
be labelled "too young". I did not find any other places in YARN where this
patch can cause problems. I indicated this behaviour in the index.md file in
the patch. As this is no breaking situation, I still propose to go forward with
this patch.
> Enable YARN to use S3A
> -----------------------
>
> Key: HADOOP-11262
> URL: https://issues.apache.org/jira/browse/HADOOP-11262
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Reporter: Thomas Demoor
> Assignee: Pieter Reuse
> Labels: amazon, s3
> Attachments: HADOOP-11262-2.patch, HADOOP-11262-3.patch,
> HADOOP-11262-4.patch, HADOOP-11262-5.patch, HADOOP-11262-6.patch,
> HADOOP-11262-7.patch, HADOOP-11262-8.patch, HADOOP-11262-9.patch,
> HADOOP-11262.patch
>
>
> Uses DelegateToFileSystem to expose S3A as an AbstractFileSystem.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)