[
https://issues.apache.org/jira/browse/HADOOP-16085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16763130#comment-16763130
]
Ben Roling commented on HADOOP-16085:
-------------------------------------
I've uploaded a new patch based on trunk and storing eTag in S3Guard instead of
versionId. The existing and my new tests pass, but there are a few things
worth mentioning.
I'll start with the simplest one. I changed S3AFileSystem.getFileStatus() to
return S3AFileStatus instead of vanilla FileStatus. I'm honestly not 100% sure
if that creates any sort of compatibility problem or is in any other way
objectionable. If so, I could cast the status where necessary instead.
The next thing is that there is a slight behavior change with seek() if there
are concurrent readers and writers (a problematic thing anyway). With my
changes, a seek() backwards will always result in EOFException on the next read
within the S3AInputStream. This happens because my changes pin the
S3AInputStream to an eTag. A seek() backwards causes a re-open and since the
eTag on S3 will have changed with a new write, a read with the old eTag will
fail. I think this is actually desirable, but still worthy of mention. The
prior code would silently switch over to reading the new version of the file
within the context of the same S3AInputStream. Only if the new version of the
file is shorter would an EOFException potentially happen when it seeks past the
length of the new version of the file.
Finally is the worst of the issues. I realized that if an overwrite of a file
succeeds on S3 but fails during the S3Guard update (e.g. exception
communicating with Dynamo), from the client's perspective the update was
successful. S3AFileSystem.finishedWrite() simply logs an error for the S3Guard
issue and moves on. However, any subsequent read of the file will fail. The
read will fail because S3Guard still has the old eTag and any read is going to
use the S3Guard eTag when calling through to GetObject on S3. This will not
return anything as the eTag doesn't match.
This led me to thinking I should update the exception handling in
S3AFileSystem.finishedWrite() to allow the IOException on S3Guard update to
propagate rather than be caught and logged. This should at least trigger the
writer to realize something went wrong and take some action. Really all it
seems the writer can do to resolve the situation is write the file again.
Assuming the new write goes through, S3Guard will get the correct new eTag and
all will be well again. I have not made this update yet though. Thoughts on
that?
> S3Guard: use object version to protect against inconsistent read after
> replace/overwrite
> ----------------------------------------------------------------------------------------
>
> Key: HADOOP-16085
> URL: https://issues.apache.org/jira/browse/HADOOP-16085
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.2.0
> Reporter: Ben Roling
> Priority: Major
> Attachments: HADOOP-16085_002.patch, HADOOP-16085_3.2.0_001.patch
>
>
> Currently S3Guard doesn't track S3 object versions. If a file is written in
> S3A with S3Guard and then subsequently overwritten, there is no protection
> against the next reader seeing the old version of the file instead of the new
> one.
> It seems like the S3Guard metadata could track the S3 object version. When a
> file is created or updated, the object version could be written to the
> S3Guard metadata. When a file is read, the read out of S3 could be performed
> by object version, ensuring the correct version is retrieved.
> I don't have a lot of direct experience with this yet, but this is my
> impression from looking through the code. My organization is looking to
> shift some datasets stored in HDFS over to S3 and is concerned about this
> potential issue as there are some cases in our codebase that would do an
> overwrite.
> I imagine this idea may have been considered before but I couldn't quite
> track down any JIRAs discussing it. If there is one, feel free to close this
> with a reference to it.
> Am I understanding things correctly? Is this idea feasible? Any feedback
> that could be provided would be appreciated. We may consider crafting a
> patch.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]