[
https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15851781#comment-15851781
]
Steve Loughran commented on HADOOP-13786:
-----------------------------------------
About to add a new patch, which does more in terms of testing, though there's
some ambiguity about the semantics of commit and abort that I need to clarify
with the MR Team.
h2. This code works without s3guard, but fails (differently) with s3guard local
and s3guard dynamo
s3guard DDDB
{code}
Running org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter
Tests run: 9, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 94.915 sec <<<
FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter
testAbort(org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter) Time
elapsed: 9.292 sec <<< FAILURE!
java.lang.AssertionError: Output directory not empty ls
s3a://hwdev-steve-ireland-new/test/testAbort [00]
S3AFileStatus{path=s3a://hwdev-steve-ireland-new/test/testAbort/part-m-00000;
isDirectory=false; length=40; replication=1; blocksize=33554432;
modification_time=1486142401246; access_time=0; owner=stevel; group=stevel;
permission=rw-rw-rw-; isSymlink=false} isEmptyDirectory=false
: array lengths differed, expected.length=0 actual.length=1
at org.junit.Assert.fail(Assert.java:88)
at
org.junit.internal.ComparisonCriteria.assertArraysAreSameLength(ComparisonCriteria.java:71)
at
org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:32)
at org.junit.Assert.internalArrayEquals(Assert.java:473)
at org.junit.Assert.assertArrayEquals(Assert.java:265)
at
org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter.testAbort(ITestS3AOutputCommitter.java:561)
testFailAbort(org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter) Time
elapsed: 9.453 sec <<< FAILURE!
java.lang.AssertionError: expected output file: unexpectedly found
s3a://hwdev-steve-ireland-new/test/testFailAbort/part-m-00000 as
S3AFileStatus{path=s3a://hwdev-steve-ireland-new/test/testFailAbort/part-m-00000;
isDirectory=false; length=40; replication=1; blocksize=33554432;
modification_time=1486142441390; access_time=0; owner=stevel; group=stevel;
permission=rw-rw-rw-; isSymlink=false} isEmptyDirectory=false
at org.junit.Assert.fail(Assert.java:88)
at
org.apache.hadoop.fs.contract.ContractTestUtils.assertPathDoesNotExist(ContractTestUtils.java:796)
at
org.apache.hadoop.fs.contract.AbstractFSContractTestBase.assertPathDoesNotExist(AbstractFSContractTestBase.java:305)
at
org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter.testFailAbort(ITestS3AOutputCommitter.java:587)
Results :
Failed tests:
ITestS3AOutputCommitter.testAbort:561->Assert.assertArrayEquals:265->Assert.internalArrayEquals:473->Assert.fail:88
Output directory not empty ls s3a://hwdev-steve-ireland-new/test/testAbort
[00]
S3AFileStatus{path=s3a://hwdev-steve-ireland-new/test/testAbort/part-m-00000;
isDirectory=false; length=40; replication=1; blocksize=33554432;
modification_time=1486142401246; access_time=0; owner=stevel; group=stevel;
permission=rw-rw-rw-; isSymlink=false} isEmptyDirectory=false
: array lengths differed, expected.length=0 actual.length=1
ITestS3AOutputCommitter.testFailAbort:587->AbstractFSContractTestBase.assertPathDoesNotExist:305->Assert.fail:88
expected output file: unexpectedly found
s3a://hwdev-steve-ireland-new/test/testFailAbort/part-m-00000 as
S3AFileStatus{path=s3a://hwdev-steve-ireland-new/test/testFailAbort/part-m-00000;
isDirectory=false; length=40; replication=1; blocksize=33554432;
modification_time=1486142441390; access_time=0; owner=stevel; group=stevel;
permission=rw-rw-rw-; isSymlink=false} isEmptyDirectory=false
Tests run: 9, Failures: 2, Errors: 0, Skipped: 0
{code}
s3guard local DB
{code}
-------------------------------------------------------
T E S T S
-------------------------------------------------------
Running org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter
Tests run: 9, Failures: 2, Errors: 2, Skipped: 0, Time elapsed: 53.226 sec <<<
FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter
testMapFileOutputCommitter(org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter)
Time elapsed: 9.394 sec <<< FAILURE!
java.lang.AssertionError: Number of MapFile.Reader entries in
s3a://hwdev-steve-ireland-new/test/testMapFileOutputCommitter : ls
s3a://hwdev-steve-ireland-new/test/testMapFileOutputCommitter [00]
S3AFileStatus{path=s3a://hwdev-steve-ireland-new/test/testMapFileOutputCommitter/_SUCCESS;
isDirectory=false; length=0; replication=1; blocksize=33554432;
modification_time=1486142538000; access_time=0; owner=stevel; group=stevel;
permission=rw-rw-rw-; isSymlink=false} isEmptyDirectory=false
expected:<1> but was:<0>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at
org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter.testMapFileOutputCommitter(ITestS3AOutputCommitter.java:500)
testAbort(org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter) Time
elapsed: 4.307 sec <<< ERROR!
java.io.FileNotFoundException: No such file or directory:
s3a://hwdev-steve-ireland-new/test/testAbort
at
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1765)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.innerListStatus(S3AFileSystem.java:1480)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:1456)
at
org.apache.hadoop.fs.contract.ContractTestUtils.listChildren(ContractTestUtils.java:427)
at
org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter.testAbort(ITestS3AOutputCommitter.java:560)
testCommitterWithFailure(org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter)
Time elapsed: 5.375 sec <<< FAILURE!
java.lang.AssertionError: Expected an exception
at
org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:374)
at
org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter.expectFNFEonJobCommit(ITestS3AOutputCommitter.java:396)
at
org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter.testCommitterWithFailure(ITestS3AOutputCommitter.java:390)
testFailAbort(org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter) Time
elapsed: 5.016 sec <<< ERROR!
java.io.FileNotFoundException: expected output dir: not found
s3a://hwdev-steve-ireland-new/test/testFailAbort in
s3a://hwdev-steve-ireland-new/test
at
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1765)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:125)
at
org.apache.hadoop.fs.contract.ContractTestUtils.verifyPathExists(ContractTestUtils.java:773)
at
org.apache.hadoop.fs.contract.ContractTestUtils.assertPathExists(ContractTestUtils.java:757)
at
org.apache.hadoop.fs.contract.AbstractFSContractTestBase.assertPathExists(AbstractFSContractTestBase.java:294)
at
org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter.testFailAbort(ITestS3AOutputCommitter.java:586)
{code}
I don't know what's up here, but I do know that (a) we're doing too many
deletes during the commit process, more specifically: creating too many mock
empty dirs. That can be optimised with a "delete don't care about a parent dir"
method in writeOperationHelper.
In the first s3guard local test, {{testMapFileOutputCommitter}} all is well in
the FS used by the job, but when a new FS instance is created in the same
process, the second one isn't seeing the listing.
Not looked at the other failures in any detail.
> Add S3Guard committer for zero-rename commits to consistent S3 endpoints
> ------------------------------------------------------------------------
>
> Key: HADOOP-13786
> URL: https://issues.apache.org/jira/browse/HADOOP-13786
> Project: Hadoop Common
> Issue Type: New Feature
> Components: fs/s3
> Affects Versions: HADOOP-13345
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Attachments: HADOOP-13786-HADOOP-13345-001.patch,
> HADOOP-13786-HADOOP-13345-002.patch, HADOOP-13786-HADOOP-13345-003.patch,
> HADOOP-13786-HADOOP-13345-004.patch
>
>
> A goal of this code is "support O(1) commits to S3 repositories in the
> presence of failures". Implement it, including whatever is needed to
> demonstrate the correctness of the algorithm. (that is, assuming that s3guard
> provides a consistent view of the presence/absence of blobs, show that we can
> commit directly).
> I consider ourselves free to expose the blobstore-ness of the s3 output
> streams (ie. not visible until the close()), if we need to use that to allow
> us to abort commit operations.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]