[
https://issues.apache.org/jira/browse/HADOOP-18012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17704552#comment-17704552
]
ASF GitHub Bot commented on HADOOP-18012:
-----------------------------------------
saxenapranav commented on code in PR #5488:
URL: https://github.com/apache/hadoop/pull/5488#discussion_r1147353317
##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.java:
##########
@@ -55,6 +55,10 @@
import java.util.concurrent.TimeUnit;
import org.apache.hadoop.classification.VisibleForTesting;
+import org.apache.hadoop.fs.azurebfs.constants.AbfsHttpConstants;
+import org.apache.hadoop.fs.azurebfs.constants.FileSystemUriSchemes;
+import org.apache.hadoop.fs.azurebfs.constants.FileSystemConfigurations;
+import org.apache.hadoop.fs.azurebfs.constants.HttpHeaderConfigurations;
Review Comment:
nit: let imports be at same place.
##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java:
##########
@@ -627,28 +683,46 @@ public boolean renameIdempotencyCheckOp(
TracingContext tracingContext) {
Preconditions.checkArgument(op.hasResult(), "Operations has null HTTP
response");
- if ((op.isARetriedRequest())
- && (op.getResult().getStatusCode() == HttpURLConnection.HTTP_NOT_FOUND)
- && isNotEmpty(sourceEtag)) {
-
- // Server has returned HTTP 404, which means rename source no longer
- // exists. Check on destination status and if its etag matches
- // that of the source, consider it to be a success.
- LOG.debug("rename {} to {} failed, checking etag of destination",
- source, destination);
+ // removing isDir from debug logs as it can be misleading
+ LOG.debug("rename({}, {}) failure {}; retry={} etag {}",
+ source, destination, op.getResult().getStatusCode(),
op.isARetriedRequest(), sourceEtag);
+ if (!(op.isARetriedRequest()
+ && (op.getResult().getStatusCode() ==
HttpURLConnection.HTTP_NOT_FOUND))) {
+ // only attempt recovery if the failure was a 404 on a retried rename
request.
+ return false;
+ }
+ if (isNotEmpty(sourceEtag)) {
+ // Server has returned HTTP 404, we have an etag, so see
+ // if the rename has actually taken place,
+ LOG.info("rename {} to {} failed, checking etag of destination",
+ source, destination);
try {
- final AbfsRestOperation destStatusOp = getPathStatus(destination,
- false, tracingContext);
+ final AbfsRestOperation destStatusOp = getPathStatus(destination,
false, tracingContext);
final AbfsHttpOperation result = destStatusOp.getResult();
- return result.getStatusCode() == HttpURLConnection.HTTP_OK
- && sourceEtag.equals(extractEtagHeader(result));
- } catch (AzureBlobFileSystemException ignored) {
+ final boolean recovered = result.getStatusCode() ==
HttpURLConnection.HTTP_OK
+ && sourceEtag.equals(extractEtagHeader(result));
+ LOG.info("File rename has taken place: recovery {}",
+ recovered ? "succeeded" : "failed");
+ return recovered;
+
+ } catch (AzureBlobFileSystemException ex) {
// GetFileStatus on the destination failed, the rename did not take
place
+ // or some other failure. log and swallow.
+ LOG.debug("Failed to get status of path {}", destination, ex);
}
+ } else {
+ LOG.debug("No source etag; unable to probe for the operation's success");
}
- return false;
+ return false;
+ }
+
+
Review Comment:
extra lines :).
##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java:
##########
@@ -578,26 +611,49 @@ public AbfsClientRenameResult renamePath(
// Extract the sourceEtag, using the status Op, and set it
// for future rename recovery.
AbfsHttpOperation sourceStatusResult = sourceStatusOp.getResult();
- String sourceEtagAfterFailure =
extractEtagHeader(sourceStatusResult);
- renamePath(source, destination, continuation, tracingContext,
- sourceEtagAfterFailure, isMetadataIncompleteState);
- }
- // if we get out of the condition without a successful rename, then
- // it isn't metadata incomplete state issue.
- isMetadataIncompleteState = false;
-
- boolean etagCheckSucceeded = renameIdempotencyCheckOp(
- source,
- sourceEtag, op, destination, tracingContext);
- if (!etagCheckSucceeded) {
- // idempotency did not return different result
- // throw back the exception
- throw e;
+ sourceEtagAfterFailure = extractEtagHeader(sourceStatusResult);
}
+ renamePath(source, destination, continuation, tracingContext,
+ sourceEtagAfterFailure, isMetadataIncompleteState);
+ }
+ // if we get out of the condition without a successful rename, then
+ // it isn't metadata incomplete state issue.
+ isMetadataIncompleteState = false;
+
+ // setting default rename recovery success to false
+ boolean etagCheckSucceeded = false;
+ if (shouldAttemptRecovery) {
Review Comment:
this depends on `renameResilience` well. This is a new field. For the case
of ` if (op.getResult().getStorageErrorCode()
.equals(RENAME_DESTINATION_PARENT_PATH_NOT_FOUND.getErrorCode())
&& !isMetadataIncompleteState`, i believe we still want to
call `renameIdempotencyCheckOp`. lets make it true when it get into the
mentioned condition.
> ABFS: Enable config controlled ETag check for Rename idempotency
> ----------------------------------------------------------------
>
> Key: HADOOP-18012
> URL: https://issues.apache.org/jira/browse/HADOOP-18012
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/azure
> Affects Versions: 3.3.2
> Reporter: Sneha Vijayarajan
> Assignee: Sree Bhattacharyya
> Priority: Major
> Labels: pull-request-available
>
> ABFS driver has a handling for rename idempotency which relies on LMT of the
> destination file to conclude if the rename was successful or not when source
> file is absent and if the rename request had entered retry loop.
> This handling is incorrect as LMT of the destination does not change on
> rename.
> This Jira will track the change to undo the current implementation and add a
> new one where for an incoming rename operation, source file eTag is fetched
> first and then rename is done only if eTag matches for the source file.
> As this is going to be a costly operation given an extra HEAD request is
> added to each rename, this implementation will be guarded over a config and
> can enabled by customers who have workloads that do multiple renames.
> Long term plan to handle rename idempotency without HEAD request is being
> discussed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]