[
https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781614#comment-17781614
]
ASF GitHub Bot commented on HADOOP-18910:
-----------------------------------------
anujmodi2021 commented on code in PR #6069:
URL: https://github.com/apache/hadoop/pull/6069#discussion_r1378439863
##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java:
##########
@@ -875,10 +873,15 @@ private boolean checkUserError(int responseStatusCode) {
&& responseStatusCode < HttpURLConnection.HTTP_INTERNAL_ERROR);
}
- private boolean isMd5ChecksumError(final AzureBlobFileSystemException e) {
+ /**
+ * To check if the failure exception returned by server is due to MD5
Mismatch
+ * @param e Exception returned by AbfsRestOperation
+ * @return boolean whether exception is due to MD5Mismatch or not
+ */
+ protected boolean isMd5ChecksumError(final AzureBlobFileSystemException e) {
return ((AbfsRestOperationException) e).getStatusCode()
== HttpURLConnection.HTTP_BAD_REQUEST
- && ((AbfsRestOperationException)
e).getErrorMessage().contains(MD5_ERROR);
+ && e.getMessage().contains(MD5_ERROR_SERVER_MESSAGE);
Review Comment:
This is how it works. Following things are returned by server:
1. Status Code: 400
2. Error Code: Md5Mismatch
3. Error Message: The MD5 value specified in the request did not match with
the MD5 value calculated by the server.
RequestId:605ff975-001f-0050-5d86-0c16aa000000 Time:2023-11-01T05:43:38.0231383Z
4. Status Description: The MD5 value specified in the request did not match
with the MD5 value calculated by the server.
From these we create an object of AbfsRestOperationException which has
following fileds:
1. Status Code: 400
2. errorCode: AzureServiceErrorCode.MD5_MISMATCH. (A constant defined in
latest commit)
3. errorMessage: The MD5 value specified in the request did not match with
the MD5 value calculated by the server.
RequestId:605ff975-001f-0050-5d86-0c16aa000000 Time:2023-11-01T05:43:38.0231383Z
This AbfsRestOperationException's parent AzureBlobFileSystem also gets
created with following fields:
1. message: "Operation Failed" + statusDescription + statuscode + method +
url + errorCode + errorMessage.
2. innerException: null
So e.getMessage() will resolve to AzureBloFileSystemException's message
which will contain a lot of other things as well.
e.getErrorMessage() will resolve to AbfsRestOperationException's message
which will not have storage error code.
Correct way will be to use e.getErrorCode() which will resolve to
AbfsRestOperationException's errorCode which is exaclty Md5Mismatch
> ABFS: Adding Support for MD5 Hash based integrity verification of the request
> content during transport
> -------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-18910
> URL: https://issues.apache.org/jira/browse/HADOOP-18910
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/azure
> Reporter: Anuj Modi
> Assignee: Anuj Modi
> Priority: Major
> Labels: pull-request-available
>
> Azure Storage Supports Content-MD5 Request Headers in Both Read and Append
> APIs.
> Read: [Path - Read - REST API (Azure Storage Services) | Microsoft
> Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read]
> Append: [Path - Update - REST API (Azure Storage Services) | Microsoft
> Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update]
> This change is to make client-side changes to support them. In Read request,
> we will send the appropriate header in response to which server will return
> the MD5 Hash of the data it sends back. On Client we will tally this with the
> MD5 hash computed from the data received.
> In Append request, we will compute the MD5 Hash of the data that we are
> sending to the server and specify that in appropriate header. Server on
> finding that header will tally this with the MD5 hash it will compute on the
> data received.
> This whole Checksum Validation Support is guarded behind a config, Config is
> by default disabled because with the use of "https" integrity of data is
> preserved anyways. This is introduced as an additional data integrity check
> which will have a performance impact as well.
> Users can decide if they want to enable this or not by setting the following
> config to *"true"* or *"false"* respectively. *Config:
> "fs.azure.enable.checksum.validation"*
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]