steveloughran commented on issue #675: HADOOP-16085: use object version or 
etags to protect against inconsistent read after replace/overwrite
URL: https://github.com/apache/hadoop/pull/675#issuecomment-488614814
 
 
   1. how about you create a new PR with the code squashed so we get this 
discussion preserved as is?
   1. I've realised we need to handle the situation of "overwritten file with 
old file still found" in the input stream and rename operations, 
   
   The `S3GuardExistsRetryPolicy` is going to have to handle errors about 
precondition not met or inconsistent etag as retriable, alongside FNFE events
   
   I think we also need to look at tuning how that read invoker is used in the 
input stream, so that the first open is treated differently from the later onese
   
   *firs*t: FNFEs, preconditions failures considered recoverable
   *second and later*: revert to old rules: non recoverable (assumption: file 
has now been deleted or overwritten)
   
   For rename() we also need to handle the 412s/FNFEs with the same policy as 
the first open: possibly recoverable.
   
   We might also want to take this opportunity to make the settings of the 
retry timeouts different, as in something like: up to 60s of retry, even if the 
usual IO retry count is tightened. 
   
   Be fun to test all of this. I could imagine doing something in the huge file 
tests where we simulate failures by overwriting files during a rename. 
   
   Note also, given in auth mode we know the length of a file, for a 0-byte 
file we may want to revisit 
[/HADOOP-13293](https://issues.apache.org/jira/browse/HADOOP-13293)'s proposal 
to serve up a zero-byte file with a special input stream. When we know from the 
filesize that the file is empty, so there's no need to go near S3 or worry 
about version changes. 
   
   _if the DDB tables have been told that a file at a path p exists and is of 
zero bytes length, then an empty stream can be served up without worrying about 
the state of S3_

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to