[ 
https://issues.apache.org/jira/browse/HADOOP-16746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018092#comment-17018092
 ] 

Steve Loughran commented on HADOOP-16746:
-----------------------------------------

confirmed: {{{S3Guard.dirListingUnion()}} removes the auth dir flag from all 
children.

The problem here is that it builds a whole new list of entries to upload, and 
it does not propagate the isAuthoritative bit of the existing ones. We are 
slightly hampered here in that this method works with PathMetadata and not the 
dynamo subclass; it would involve some hoop jumping to handle this.

I'm adding a different solution, passing down to the metastore a list of paths 
whose entries have not been updated. This can be used to filter out those 
entries which don't need updating. This implicitly preserves the authoritative 
flag one child entries. It also saves on uploading. overwriting those entries 
needlessly. 

Currently, I believe that every call of this class will end up writing back to 
S3Guard everything, even if there has been no change. I think we can all agree 
this is suboptimal.

> s3a empty dir markers are not created in s3guard as authoritative
> -----------------------------------------------------------------
>
>                 Key: HADOOP-16746
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16746
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.3.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>
> Newly created empty dirs, or markers created after delete operations, are not 
> marked in S3Guard as auth. This has adverse consequences in that following 
> changes (i.e. new files) don't get marked as auth either...it needs a 
> listFiles call to scan the source and mark as auth.
> I could stick a quick fix in to HADOOP-16697, but don't want to as I don't 
> like what that would mean. Essentially, finishedWrite() need to recognise 
> when an empty directory markers being created (it does this already) and then 
> *always* declare it as auth.
> I'd prefer for the mkdirs operation to pass a flag all the way through to 
> finishedWrite so that it doesn't need to infer this. The WriteOpContext of 
> HADOOP-16134 would be the way to do this. Yes it's a big change but it would 
> be extensible -and I already have some plans there.
> Instead it will be a follow-up.
> The tests for this problem are part of HADOOP-16697, just disabled for now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to