[
https://issues.apache.org/jira/browse/HADOOP-16746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018092#comment-17018092
]
Steve Loughran commented on HADOOP-16746:
-----------------------------------------
confirmed: {{{S3Guard.dirListingUnion()}} removes the auth dir flag from all
children.
The problem here is that it builds a whole new list of entries to upload, and
it does not propagate the isAuthoritative bit of the existing ones. We are
slightly hampered here in that this method works with PathMetadata and not the
dynamo subclass; it would involve some hoop jumping to handle this.
I'm adding a different solution, passing down to the metastore a list of paths
whose entries have not been updated. This can be used to filter out those
entries which don't need updating. This implicitly preserves the authoritative
flag one child entries. It also saves on uploading. overwriting those entries
needlessly.
Currently, I believe that every call of this class will end up writing back to
S3Guard everything, even if there has been no change. I think we can all agree
this is suboptimal.
> s3a empty dir markers are not created in s3guard as authoritative
> -----------------------------------------------------------------
>
> Key: HADOOP-16746
> URL: https://issues.apache.org/jira/browse/HADOOP-16746
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.3.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Major
>
> Newly created empty dirs, or markers created after delete operations, are not
> marked in S3Guard as auth. This has adverse consequences in that following
> changes (i.e. new files) don't get marked as auth either...it needs a
> listFiles call to scan the source and mark as auth.
> I could stick a quick fix in to HADOOP-16697, but don't want to as I don't
> like what that would mean. Essentially, finishedWrite() need to recognise
> when an empty directory markers being created (it does this already) and then
> *always* declare it as auth.
> I'd prefer for the mkdirs operation to pass a flag all the way through to
> finishedWrite so that it doesn't need to infer this. The WriteOpContext of
> HADOOP-16134 would be the way to do this. Yes it's a big change but it would
> be extensible -and I already have some plans there.
> Instead it will be a follow-up.
> The tests for this problem are part of HADOOP-16697, just disabled for now.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]