[ 
https://issues.apache.org/jira/browse/HADOOP-13914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Fabbri updated HADOOP-13914:
----------------------------------
    Attachment: HADOOP-13914-HADOOP-13345.002.patch

Attaching initial patch of my solution for this.
 
Also include new test which fails before this patch and succeeds after.
    
    Basic idea:
    - Identify codepaths that actually use S3AFileStatus#isEmptyDirectory.
    - Those codepaths expilcitly tell getFileStatus() code that they need this 
flag
      to be computed.
    - If the MetadataStore has a FileStatus, but does not know whether or not 
it is
      an empty directory, fall through to the existing s3a logic that computes 
the
      flag.

- A new enum, {{Tristate}} is used to represent TRUE, FALSE, and UNKNOWN.
  UNKNOWN can be returned by a MetadataStore if it does not have enough state to
  know whether or not the directory is empty.

Other notes:
- I changed S3AFileSystem#getFileStatus() API, as it is annotated as Private 
and  Evolving.  It returns a FileStatus type to enforce that callers don't use  
isEmptyDirectory()

What I don't love about the patch:

Valid values of {{S3AFileStatus#isEmptyDirectory()}} depend on runtime 
codepath: All S3AFileStatus' returned by innerGetFileStatus(p, 
wantEmptyDirectoryFlag=true) are boolean (TRUE or FALSE), but other callers may 
also see UNKNOWN.
I feel like we should be using the type system to enforce this.  I.e. create a 
new subclass of S3AFileStatus, {{S3AFileStatusWithEmptyDirFlag}} which always 
has true or false value.  We'd move isEmptyDirectory() out of S3AFileStatus 
into there.  This would allow compiler to check that only the desired codepaths 
can see isEmptyDirectory().  I suppose we could also just use S3AFileStatus for 
this purpose, making all code paths that *don't* care if the status is an empty 
directory, just get a vanilla FileStatus from getFileStatus().

> s3guard: improve S3AFileStatus#isEmptyDirectory handling
> --------------------------------------------------------
>
>                 Key: HADOOP-13914
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13914
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: HADOOP-13345
>            Reporter: Aaron Fabbri
>            Assignee: Mingliang Liu
>         Attachments: HADOOP-13914-HADOOP-13345.000.patch, 
> HADOOP-13914-HADOOP-13345.002.patch, s3guard-empty-dirs.md, 
> test-only-HADOOP-13914.patch
>
>
> As discussed in HADOOP-13449, proper support for the isEmptyDirectory() flag 
> stored in S3AFileStatus is missing from DynamoDBMetadataStore.
> The approach taken by LocalMetadataStore is not suitable for the DynamoDB 
> implementation, and also sacrifices good code separation to minimize 
> S3AFileSystem changes pre-merge to trunk.
> I will attach a design doc that attempts to clearly explain the problem and 
> preferred solution.  I suggest we do this work after merging the HADOOP-13345 
> branch to trunk, but am open to suggestions.
> I can also attach a patch of a integration test that exercises the missing 
> case and demonstrates a failure with DynamoDBMetadataStore.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to