steveloughran commented on issue #654: HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename URL: https://github.com/apache/hadoop/pull/654#issuecomment-487529747 @harshavardhana I'm not making any promises of performance. Already on large files the AWS SDK transfer manager breaks things up into 128MB units, so what we gain here is the copying files in parallel above that, which is probably best for smaller files. I'm looking at resilience of failures and S3Guard consistency —that's the key driver and takes priority over the speedups. Note that time to rename on AWS will always been O(data), even if we do more files in parallel; it won't be atomic. For high performance commit algorithms you need something like the S3A committers, or S3-first data structures like Apache Iceberg
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
