steveloughran commented on issue #654: HADOOP-15183 S3Guard store becomes 
inconsistent after partial failure of rename
URL: https://github.com/apache/hadoop/pull/654#issuecomment-487529747
 
 
   @harshavardhana I'm not making any promises of performance. Already on large 
files the AWS SDK transfer manager breaks things up into 128MB units, so what 
we gain here is the copying files in parallel above that, which is probably 
best for smaller files.
   
   I'm looking at resilience of failures and S3Guard consistency —that's the 
key driver and takes priority over the speedups.
   
   Note that time to rename on AWS will always been O(data), even if we do more 
files in parallel; it won't be atomic. For high performance commit algorithms 
you need something like the S3A committers, or S3-first data structures like 
Apache Iceberg

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to