[
https://issues.apache.org/jira/browse/HADOOP-19604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18012587#comment-18012587
]
ASF GitHub Bot commented on HADOOP-19604:
-----------------------------------------
Copilot commented on code in PR #7853:
URL: https://github.com/apache/hadoop/pull/7853#discussion_r2260273456
##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsBlobClient.java:
##########
@@ -1103,8 +1107,21 @@ public AbfsRestOperation flush(byte[] buffer,
AbfsRestOperation op1 = getPathStatus(path, true, tracingContext,
contextEncryptionAdapter);
String metadataMd5 = op1.getResult().getResponseHeader(CONTENT_MD5);
- if (blobMd5 != null && !blobMd5.equals(metadataMd5)) {
- throw ex;
+ /*
+ * Validate the response by comparing the server's MD5 metadata
against either:
+ * 1. The full blob content MD5 (if full blob checksum validation is
enabled), or
+ * 2. The full block ID buffer MD5 (fallback if blob checksum
validation is disabled)
+ */
+ if (getAbfsConfiguration().isFullBlobChecksumValidationEnabled() &&
blobMd5 != null) {
Review Comment:
The configuration check
`getAbfsConfiguration().isFullBlobChecksumValidationEnabled()` is performed
inside the response validation logic, but this same check was already done
earlier in the method. Consider storing the result in a variable to avoid
redundant method calls.
```suggestion
if (fullBlobChecksumValidationEnabled && blobMd5 != null) {
```
##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsBlobClient.java:
##########
@@ -1073,11 +1073,15 @@ public AbfsRestOperation flush(byte[] buffer,
requestHeaders.add(new AbfsHttpHeader(CONTENT_LENGTH,
String.valueOf(buffer.length)));
requestHeaders.add(new AbfsHttpHeader(CONTENT_TYPE, APPLICATION_XML));
requestHeaders.add(new AbfsHttpHeader(IF_MATCH, eTag));
+ String md5Hash = null;
if (leaseId != null) {
requestHeaders.add(new AbfsHttpHeader(X_MS_LEASE_ID, leaseId));
}
- if (blobMd5 != null) {
+ if (isFullBlobChecksumValidationEnabled() && blobMd5 != null) {
requestHeaders.add(new AbfsHttpHeader(X_MS_BLOB_CONTENT_MD5, blobMd5));
+ } else {
+ md5Hash = computeMD5Hash(buffer, 0, buffer.length);
+ requestHeaders.add(new AbfsHttpHeader(X_MS_BLOB_CONTENT_MD5, md5Hash));
Review Comment:
The variable `md5Hash` is declared but only used in the else branch.
Consider declaring it within the else block to improve code clarity and reduce
the scope of the variable.
##########
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/ITestAbfsOutputStream.java:
##########
@@ -481,6 +481,8 @@ public void testResetCalledOnExceptionInRemoteFlush()
throws Exception {
//expected exception
}
// Verify that reset was called on the message digest
- Mockito.verify(mockMessageDigest, Mockito.times(1)).reset();
+ if (spiedClient.isChecksumValidationEnabled()) {
Review Comment:
The test is checking `isChecksumValidationEnabled()` but the code change
suggests this should be checking `isFullBlobChecksumValidationEnabled()` since
the MD5 reset behavior is now conditional on full blob checksum validation, not
general checksum validation.
```suggestion
if (spiedClient.isFullBlobChecksumValidationEnabled()) {
```
##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AbfsConfiguration.java:
##########
@@ -1705,6 +1709,10 @@ public void setIsChecksumValidationEnabled(boolean
isChecksumValidationEnabled)
this.isChecksumValidationEnabled = isChecksumValidationEnabled;
}
+ public boolean isFullBlobChecksumValidationEnabled() {
Review Comment:
The getter method `isFullBlobChecksumValidationEnabled()` lacks a
corresponding setter method, which breaks the pattern established by other
configuration properties like `isChecksumValidationEnabled`. This could cause
issues for programmatic configuration changes.
> ABFS: Fix WASB ABFS compatibility issues
> ----------------------------------------
>
> Key: HADOOP-19604
> URL: https://issues.apache.org/jira/browse/HADOOP-19604
> Project: Hadoop Common
> Issue Type: Sub-task
> Affects Versions: 3.4.1
> Reporter: Anmol Asrani
> Assignee: Anmol Asrani
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.4.1
>
>
> Fix WASB ABFS compatibility issues. Fix issues such as:-
> # BlockId computation to be consistent across clients for PutBlock and
> PutBlockList
> # Restrict url encoding of certain json metadata during setXAttr calls.
> # Maintain the md5 hash of whole block to validate data integrity during
> flush.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]