steveloughran commented on issue #675: HADOOP-16085: use object version or etags to protect against inconsistent read after replace/overwrite URL: https://github.com/apache/hadoop/pull/675#issuecomment-488614814 1. how about you create a new PR with the code squashed so we get this discussion preserved as is? 1. I've realised we need to handle the situation of "overwritten file with old file still found" in the input stream and rename operations, The `S3GuardExistsRetryPolicy` is going to have to handle errors about precondition not met or inconsistent etag as retriable, alongside FNFE events I think we also need to look at tuning how that read invoker is used in the input stream, so that the first open is treated differently from the later onese *firs*t: FNFEs, preconditions failures considered recoverable *second and later*: revert to old rules: non recoverable (assumption: file has now been deleted or overwritten) For rename() we also need to handle the 412s/FNFEs with the same policy as the first open: possibly recoverable. We might also want to take this opportunity to make the settings of the retry timeouts different, as in something like: up to 60s of retry, even if the usual IO retry count is tightened. Be fun to test all of this. I could imagine doing something in the huge file tests where we simulate failures by overwriting files during a rename. Note also, given in auth mode we know the length of a file, for a 0-byte file we may want to revisit [/HADOOP-13293](https://issues.apache.org/jira/browse/HADOOP-13293)'s proposal to serve up a zero-byte file with a special input stream. When we know from the filesize that the file is empty, so there's no need to go near S3 or worry about version changes. _if the DDB tables have been told that a file at a path p exists and is of zero bytes length, then an empty stream can be served up without worrying about the state of S3_
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
