[
https://issues.apache.org/jira/browse/HADOOP-19604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18004214#comment-18004214
]
ASF GitHub Bot commented on HADOOP-19604:
-----------------------------------------
bhattmanish98 commented on code in PR #7777:
URL: https://github.com/apache/hadoop/pull/7777#discussion_r2195482892
##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsBlobBlock.java:
##########
@@ -42,23 +48,40 @@ public class AbfsBlobBlock extends AbfsBlock {
* @param offset Used to generate blockId based on offset.
* @throws IOException exception is thrown.
*/
- AbfsBlobBlock(AbfsOutputStream outputStream, long offset) throws IOException
{
+ AbfsBlobBlock(AbfsOutputStream outputStream, long offset, int blockIdLength,
long blockIndex) throws IOException {
super(outputStream, offset);
- this.blockId = generateBlockId(offset);
+ this.blockIndex = blockIndex;
+ String streamId = outputStream.getStreamID();
+ UUID streamIdGuid =
UUID.nameUUIDFromBytes(streamId.getBytes(StandardCharsets.UTF_8));
Review Comment:
Can streamId be null? streamId.getBytes can raise null pointer exception.
Better to handle it,
##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsBlobClient.java:
##########
@@ -982,6 +983,9 @@ public AbfsRestOperation appendBlock(final String path,
if (requestParameters.getLeaseId() != null) {
requestHeaders.add(new AbfsHttpHeader(X_MS_LEASE_ID,
requestParameters.getLeaseId()));
}
+ if (isChecksumValidationEnabled()) {
+ addCheckSumHeaderForWrite(requestHeaders, requestParameters);
Review Comment:
NIT: formatting required - there must be one tab in the start
##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsBlobClient.java:
##########
@@ -1032,7 +1036,7 @@ public AbfsRestOperation flush(final String path,
final String cachedSasToken,
final String leaseId,
final ContextEncryptionAdapter contextEncryptionAdapter,
- final TracingContext tracingContext) throws AzureBlobFileSystemException
{
+ final TracingContext tracingContext, String blobMd5) throws
AzureBlobFileSystemException {
Review Comment:
Add new argument in the comments @param. Please make this change wherever
required.
##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java:
##########
@@ -878,27 +878,29 @@ public boolean appendSuccessCheckOp(AbfsRestOperation op,
final String path,
* @param leaseId if there is an active lease on the path.
* @param contextEncryptionAdapter to provide encryption context.
* @param tracingContext for tracing the server calls.
+ * @param blobMd5 The Base64-encoded MD5 hash of the blob for data
integrity validation.
* @return executed rest operation containing response from server.
* @throws AzureBlobFileSystemException if rest operation fails.
*/
public abstract AbfsRestOperation flush(String path, long position,
boolean retainUncommittedData, boolean isClose,
String cachedSasToken, String leaseId,
- ContextEncryptionAdapter contextEncryptionAdapter, TracingContext
tracingContext)
+ ContextEncryptionAdapter contextEncryptionAdapter, TracingContext
tracingContext, String blobMd5)
throws AzureBlobFileSystemException;
/**
- * Flush previously uploaded data to a file.
- * @param buffer containing blockIds to be flushed.
- * @param path on which data has to be flushed.
- * @param isClose specify if this is the last flush to the file.
- * @param cachedSasToken to be used for the authenticating operation.
- * @param leaseId if there is an active lease on the path.
- * @param eTag to specify conditional headers.
- * @param contextEncryptionAdapter to provide encryption context.
- * @param tracingContext for tracing the server calls.
- * @return executed rest operation containing response from server.
- * @throws AzureBlobFileSystemException if rest operation fails.
+ * Flushes previously uploaded data to the specified path.
Review Comment:
NIT - Format can be consistent across places.
##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/contracts/services/AppendRequestParameters.java:
##########
@@ -77,6 +81,7 @@ public AppendRequestParameters(final long position,
* @param leaseId leaseId of the blob to be appended
* @param isExpectHeaderEnabled true if the expect header is enabled
* @param blobParams parameters specific to append operation on Blob
Endpoint.
+ * @param md5 The Base64-encoded MD5 hash of the block for data integrity
validation.
Review Comment:
NIT: Format can be connected - extra space before @param and after md5
##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsBlobBlock.java:
##########
@@ -42,23 +42,40 @@ public class AbfsBlobBlock extends AbfsBlock {
* @param offset Used to generate blockId based on offset.
* @throws IOException exception is thrown.
*/
- AbfsBlobBlock(AbfsOutputStream outputStream, long offset) throws IOException
{
+ AbfsBlobBlock(AbfsOutputStream outputStream, long offset, int blockIdLength,
long blockIndex) throws IOException {
super(outputStream, offset);
- this.blockId = generateBlockId(offset);
+ this.blockIndex = blockIndex;
+ String streamId = getOutputStream().getStreamID();
+ UUID streamIdGuid =
UUID.nameUUIDFromBytes(streamId.getBytes(StandardCharsets.UTF_8));
+ this.blockId = generateBlockId(streamIdGuid, blockIdLength);
}
/**
- * Helper method that generates blockId.
- * @param position The offset needed to generate blockId.
- * @return String representing the block ID generated.
+ * Generates a Base64-encoded block ID string based on the given position,
stream ID, and desired raw length.
Review Comment:
In the comment, you have mentioned that block id is generated based on
position, stream Id and raw length. Where exactly are we using position in this
method? Am I missing something here?
##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsBlobClient.java:
##########
@@ -1060,7 +1064,7 @@ public AbfsRestOperation flush(byte[] buffer,
final String leaseId,
final String eTag,
ContextEncryptionAdapter contextEncryptionAdapter,
- final TracingContext tracingContext) throws AzureBlobFileSystemException
{
+ final TracingContext tracingContext, String blobMd5) throws
AzureBlobFileSystemException {
Review Comment:
Same as above.
##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsBlobBlock.java:
##########
@@ -42,23 +48,40 @@ public class AbfsBlobBlock extends AbfsBlock {
* @param offset Used to generate blockId based on offset.
Review Comment:
Add newly added parameter in method comment.
> ABFS: Fix WASB ABFS compatibility issues
> ----------------------------------------
>
> Key: HADOOP-19604
> URL: https://issues.apache.org/jira/browse/HADOOP-19604
> Project: Hadoop Common
> Issue Type: Sub-task
> Affects Versions: 3.4.1
> Reporter: Anmol Asrani
> Assignee: Anmol Asrani
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.4.1
>
>
> Fix WASB ABFS compatibility issues. Fix issues such as:-
> # BlockId computation to be consistent across clients for PutBlock and
> PutBlockList
> # Restrict url encoding of certain json metadata during setXAttr calls.
> # Maintain the md5 hash of whole block to validate data integrity during
> flush.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]