[jira] [Commented] (HADOOP-19645) ABFS: [ReadAheadV2] Improve Metrics for Read Calls to identify type of read done.

ASF GitHub Bot (Jira) Sun, 03 Aug 2025 21:11:31 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-19645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18011765#comment-18011765
 ]


ASF GitHub Bot commented on HADOOP-19645:
-----------------------------------------

anujmodi2021 commented on code in PR #7837:
URL: https://github.com/apache/hadoop/pull/7837#discussion_r2250353075


##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/utils/TracingContext.java:
##########
@@ -193,32 +213,33 @@ public void setListener(Listener listener) {
   public void constructHeader(AbfsHttpOperation httpOperation, String 
previousFailure, String retryPolicyAbbreviation) {
     clientRequestId = UUID.randomUUID().toString();
     switch (format) {
-    case ALL_ID_FORMAT: // Optional IDs (e.g. streamId) may be empty
-      header =
-          clientCorrelationID + ":" + clientRequestId + ":" + fileSystemID + 
":"
-              + getPrimaryRequestIdForHeader(retryCount > 0) + ":" + streamID
-              + ":" + opType + ":" + retryCount;
-      header = addFailureReasons(header, previousFailure, 
retryPolicyAbbreviation);
-      if (!(ingressHandler.equals(EMPTY_STRING))) {
-        header += ":" + ingressHandler;
-      }
-      if (!(position.equals(EMPTY_STRING))) {
-        header += ":" + position;
-      }
-      if (operatedBlobCount != null) {
-        header += (":" + operatedBlobCount);
-      }
-      header += (":" + httpOperation.getTracingContextSuffix());
-      metricHeader += !(metricResults.trim().isEmpty()) ? metricResults  : "";
+    case ALL_ID_FORMAT:
+      header = TracingHeaderVersion.V1.getVersion() + COLON
+          + clientCorrelationID + COLON
+          + clientRequestId + COLON
+          + fileSystemID + COLON
+          + getPrimaryRequestIdForHeader(retryCount > 0) + COLON
+          + streamID + COLON
+          + opType + COLON
+          + getRetryHeader(previousFailure, retryPolicyAbbreviation) + COLON
+          + ingressHandler + COLON

Review Comment:
   With empty checks we cannot have a fixed schema. We need the proper defined 
schema where each position after split is fixed for all the headers and 
analysis can be done easily without worrying about the position of info we need 
to analyse.



##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/utils/TracingContext.java:
##########
@@ -193,32 +213,33 @@ public void setListener(Listener listener) {
   public void constructHeader(AbfsHttpOperation httpOperation, String 
previousFailure, String retryPolicyAbbreviation) {
     clientRequestId = UUID.randomUUID().toString();
     switch (format) {
-    case ALL_ID_FORMAT: // Optional IDs (e.g. streamId) may be empty
-      header =
-          clientCorrelationID + ":" + clientRequestId + ":" + fileSystemID + 
":"
-              + getPrimaryRequestIdForHeader(retryCount > 0) + ":" + streamID
-              + ":" + opType + ":" + retryCount;
-      header = addFailureReasons(header, previousFailure, 
retryPolicyAbbreviation);
-      if (!(ingressHandler.equals(EMPTY_STRING))) {
-        header += ":" + ingressHandler;
-      }
-      if (!(position.equals(EMPTY_STRING))) {
-        header += ":" + position;
-      }
-      if (operatedBlobCount != null) {
-        header += (":" + operatedBlobCount);
-      }
-      header += (":" + httpOperation.getTracingContextSuffix());
-      metricHeader += !(metricResults.trim().isEmpty()) ? metricResults  : "";
+    case ALL_ID_FORMAT:
+      header = TracingHeaderVersion.V1.getVersion() + COLON
+          + clientCorrelationID + COLON
+          + clientRequestId + COLON
+          + fileSystemID + COLON
+          + getPrimaryRequestIdForHeader(retryCount > 0) + COLON
+          + streamID + COLON
+          + opType + COLON
+          + getRetryHeader(previousFailure, retryPolicyAbbreviation) + COLON
+          + ingressHandler + COLON
+          + position + COLON
+          + operatedBlobCount + COLON
+          + httpOperation.getTracingContextSuffix() + COLON
+          + getOperationSpecificHeader(opType);
+
+      metricHeader += !(metricResults.trim().isEmpty()) ? metricResults  : 
EMPTY_STRING;
       break;
     case TWO_ID_FORMAT:
-      header = clientCorrelationID + ":" + clientRequestId;
-      metricHeader += !(metricResults.trim().isEmpty()) ? metricResults  : "";
+      header = TracingHeaderVersion.V1.getVersion() + COLON

Review Comment:
   Fixed





> ABFS: [ReadAheadV2] Improve Metrics for Read Calls to identify type of read 
> done.
> ---------------------------------------------------------------------------------
>
>                 Key: HADOOP-19645
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19645
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/azure
>    Affects Versions: 3.3.6, 3.4.1
>            Reporter: Anuj Modi
>            Assignee: Anuj Modi
>            Priority: Major
>              Labels: pull-request-available
>
> There are a number of ways in which ABFS driver can trigger a network call to 
> read data. We need a way to identify what type of read call was made from 
> client. Plan is to add an indication for this in already present 
> ClientRequestId header.
> Following are types of read we want to identify:
>  # Direct Read: Read from a given position in remote file. This will be 
> synchronous read
>  # Normal Read: Read from current seeked position where read ahead was 
> bypassed. This will be synchronous read.
>  # Prefetch Read: Read triggered from background threads filling up in memory 
> cache. This will be asynchronous read.
>  # Missed Cache Read: Read triggered after nothing was received from read 
> ahead. This will be synchronous read.
>  # Footer Read: Read triggered as part of footer read optimization. This will 
> be synchronous.
>  # Small File Read: Read triggered as a part of small file read. This will be 
> synchronous read.
> We will add another field in the Tracing Header (Client Request Id) for each 
> request. We can call this field "Operation Specific Header" very similar to 
> how we have "Retry Header" today. As part of this we will only use it for 
> read operations keeping it empty for other operations. Moving ahead f we need 
> to publish any operation specific info, same header can be used.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-19645) ABFS: [ReadAheadV2] Improve Metrics for Read Calls to identify type of read done.

Reply via email to