[
https://issues.apache.org/jira/browse/HADOOP-14457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056703#comment-16056703
]
Aaron Fabbri commented on HADOOP-14457:
---------------------------------------
[~mackrorysd] and I just chatted about this and I think we came up with a
better option. Instead of adding a "recursive" put to MetadataStore, I
suggested a batched put.
The main thing I disliked about adding a new method
{{MetadataStore#putRecursive(PathMetadata p)}} is that it is not explicit as to
which ancestors are created. It is this sort of sloppy method "I'm creating
this thing, make sure the ancestors exist" instead of an explicit "here are the
paths that I'm creating". The former makes it hard to use MetadataStore
interface for precise metadata logging.
One of the problems with requiring the FS client to put() each path it creates
is that, since dynamo has to create ancestors internally for its schema,
DynamoDB ends up having to do extra round trips (both FS and dynamo making sure
ancestors are created). [~mackrorysd] rightfully pushed back on this. Even
with the capability check I suggested (which should be correct but not trivial
to reason about), Dynamo would be affected by extra round trips once we added
authoritative listing support (HADOOP-14154).
We both seemed to agree that adding a batched
{{MetadataStore#put(Collection<PathMetadata>
pathsToCreate)}} seemed like a better option than the capability check.
The dynamo implementation can use existing logic from its move() implementation
to pre-process the list to include ancestors in the batched put list. Dynamo
gets optimal round trips, now and in the future, and MetadataStore is still
able to express explicit logging of metadata changes.
Thanks for great discussion on this [~mackrorysd], please add anything I missed.
> create() does not notify metadataStore of parent directories or ensure
> they're not existing files
> -------------------------------------------------------------------------------------------------
>
> Key: HADOOP-14457
> URL: https://issues.apache.org/jira/browse/HADOOP-14457
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Reporter: Sean Mackrory
> Assignee: Sean Mackrory
> Attachments: HADOOP-14457-HADOOP-13345.001.patch,
> HADOOP-14457-HADOOP-13345.002.patch, HADOOP-14457-HADOOP-13345.003.patch,
> HADOOP-14457-HADOOP-13345.004.patch, HADOOP-14457-HADOOP-13345.005.patch,
> HADOOP-14457-HADOOP-13345.006.patch, HADOOP-14457-HADOOP-13345.007.patch,
> HADOOP-14457-HADOOP-13345.008.patch, HADOOP-14457-HADOOP-13345.009.patch
>
>
> Not a great test yet, but it at least reliably demonstrates the issue.
> LocalMetadataStore will sometimes erroneously report that a directory is
> empty with isAuthoritative = true when it *definitely* has children the
> metadatastore should know about. It doesn't appear to happen if the children
> are just directory. The fact that it's returning an empty listing is
> concerning, but the fact that it says it's authoritative *might* be a second
> bug.
> {code}
> diff --git
> a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
>
> b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
> index 78b3970..1821d19 100644
> ---
> a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
> +++
> b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
> @@ -965,7 +965,7 @@ public boolean hasMetadataStore() {
> }
>
> @VisibleForTesting
> - MetadataStore getMetadataStore() {
> + public MetadataStore getMetadataStore() {
> return metadataStore;
> }
>
> diff --git
> a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java
>
> b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java
> index 4339649..881bdc9 100644
> ---
> a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java
> +++
> b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java
> @@ -23,6 +23,11 @@
> import org.apache.hadoop.fs.contract.AbstractFSContract;
> import org.apache.hadoop.fs.FileSystem;
> import org.apache.hadoop.fs.Path;
> +import org.apache.hadoop.fs.s3a.S3AFileSystem;
> +import org.apache.hadoop.fs.s3a.Tristate;
> +import org.apache.hadoop.fs.s3a.s3guard.DirListingMetadata;
> +import org.apache.hadoop.fs.s3a.s3guard.MetadataStore;
> +import org.junit.Test;
>
> import static org.apache.hadoop.fs.contract.ContractTestUtils.dataset;
> import static org.apache.hadoop.fs.contract.ContractTestUtils.writeDataset;
> @@ -72,4 +77,24 @@ public void testRenameDirIntoExistingDir() throws
> Throwable {
> boolean rename = fs.rename(srcDir, destDir);
> assertFalse("s3a doesn't support rename to non-empty directory", rename);
> }
> +
> + @Test
> + public void testMkdirPopulatesFileAncestors() throws Exception {
> + final FileSystem fs = getFileSystem();
> + final MetadataStore ms = ((S3AFileSystem) fs).getMetadataStore();
> + final Path parent = path("testMkdirPopulatesFileAncestors/source");
> + try {
> + fs.mkdirs(parent);
> + final Path nestedFile = new Path(parent, "dir1/dir2/dir3/file4");
> + byte[] srcDataset = dataset(256, 'a', 'z');
> + writeDataset(fs, nestedFile, srcDataset, srcDataset.length,
> + 1024, false);
> +
> + DirListingMetadata list = ms.listChildren(parent);
> + assertTrue("MetadataStore falsely reports authoritative empty list",
> + list.isEmpty() == Tristate.FALSE || !list.isAuthoritative());
> + } finally {
> + fs.delete(parent, true);
> + }
> + }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]