[
https://issues.apache.org/jira/browse/HADOOP-13914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aaron Fabbri updated HADOOP-13914:
----------------------------------
Attachment: HADOOP-13914-HADOOP-13345.002.patch
Attaching initial patch of my solution for this.
Also include new test which fails before this patch and succeeds after.
Basic idea:
- Identify codepaths that actually use S3AFileStatus#isEmptyDirectory.
- Those codepaths expilcitly tell getFileStatus() code that they need this
flag
to be computed.
- If the MetadataStore has a FileStatus, but does not know whether or not
it is
an empty directory, fall through to the existing s3a logic that computes
the
flag.
- A new enum, {{Tristate}} is used to represent TRUE, FALSE, and UNKNOWN.
UNKNOWN can be returned by a MetadataStore if it does not have enough state to
know whether or not the directory is empty.
Other notes:
- I changed S3AFileSystem#getFileStatus() API, as it is annotated as Private
and Evolving. It returns a FileStatus type to enforce that callers don't use
isEmptyDirectory()
What I don't love about the patch:
Valid values of {{S3AFileStatus#isEmptyDirectory()}} depend on runtime
codepath: All S3AFileStatus' returned by innerGetFileStatus(p,
wantEmptyDirectoryFlag=true) are boolean (TRUE or FALSE), but other callers may
also see UNKNOWN.
I feel like we should be using the type system to enforce this. I.e. create a
new subclass of S3AFileStatus, {{S3AFileStatusWithEmptyDirFlag}} which always
has true or false value. We'd move isEmptyDirectory() out of S3AFileStatus
into there. This would allow compiler to check that only the desired codepaths
can see isEmptyDirectory(). I suppose we could also just use S3AFileStatus for
this purpose, making all code paths that *don't* care if the status is an empty
directory, just get a vanilla FileStatus from getFileStatus().
> s3guard: improve S3AFileStatus#isEmptyDirectory handling
> --------------------------------------------------------
>
> Key: HADOOP-13914
> URL: https://issues.apache.org/jira/browse/HADOOP-13914
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: HADOOP-13345
> Reporter: Aaron Fabbri
> Assignee: Mingliang Liu
> Attachments: HADOOP-13914-HADOOP-13345.000.patch,
> HADOOP-13914-HADOOP-13345.002.patch, s3guard-empty-dirs.md,
> test-only-HADOOP-13914.patch
>
>
> As discussed in HADOOP-13449, proper support for the isEmptyDirectory() flag
> stored in S3AFileStatus is missing from DynamoDBMetadataStore.
> The approach taken by LocalMetadataStore is not suitable for the DynamoDB
> implementation, and also sacrifices good code separation to minimize
> S3AFileSystem changes pre-merge to trunk.
> I will attach a design doc that attempts to clearly explain the problem and
> preferred solution. I suggest we do this work after merging the HADOOP-13345
> branch to trunk, but am open to suggestions.
> I can also attach a patch of a integration test that exercises the missing
> case and demonstrates a failure with DynamoDBMetadataStore.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]