[ 
https://issues.apache.org/jira/browse/HADOOP-19645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anuj Modi updated HADOOP-19645:
-------------------------------
    Description: 
There are a number of ways in which ABFS driver can trigger a network call to 
read data. We need a way to identify what type of read call was made from 
client. Plan is to add an indication for this in already present 
ClientRequestId header.

Following are types of read we want to identify:
 # Direct Read: Read from a given position in remote file. This will be 
synchronous read
 # Normal Read: Read from current seeked position where read ahead was 
bypassed. This will be synchronous read.
 # Prefetch Read: Read triggered from background threads filling up in memory 
cache. This will be asynchronous read.
 # Missed Cache Read: Read triggered after nothing was received from read 
ahead. This will be synchronous read.
 # Footer Read: Read triggered as part of footer read optimization. This will 
be synchronous.
 # Small File Read: Read triggered as a part of small file read. This will be 
synchronous read.

We will add another field in the Tracing Header (Client Request Id) for each 
request. We can call this field "Operation Specific Header" very similar to how 
we have "Retry Header" today. As part of this we will only use it for read 
operations keeping it empty for other operations. Moving ahead f we need to 
publish any operation specific info, same header can be used.

  was:
There are a number of ways in which ABFS driver can trigger a network call to 
read data. We need a way to identify what type of read call was made from 
client. Plan is to add an indication for this in already present 
ClientRequestId header.

Following are types of read we want to identify:
 # Direct Read: Read from a given position in remote file. This will be 
synchronous read
 # Normal Read: Read from current seeked position where read ahead was 
bypassed. This will be synchronous read.
 # Prefetch Read: Read triggered from background threads filling up in memory 
cache. This will be asynchronous read.
 # Optimised Read: Read triggered as part of some special optimizations like 
small file or footer reads. This will be synchronous.

 

We will add another field in the Tracing Header (Client Request Id) for each 
request. We can call this field "Operation Specific Header" very similar to how 
we have "Retry Header" today. As part of this we will only use it for read 
operations keeping it empty for other operations. Moving ahead f we need to 
publish any operation specific info, same header can be used.


> ABFS: [ReadAheadV2] Improve Metrics for Read Calls to identify type of read 
> done.
> ---------------------------------------------------------------------------------
>
>                 Key: HADOOP-19645
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19645
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/azure
>    Affects Versions: 3.3.6, 3.4.1
>            Reporter: Anuj Modi
>            Assignee: Anuj Modi
>            Priority: Major
>
> There are a number of ways in which ABFS driver can trigger a network call to 
> read data. We need a way to identify what type of read call was made from 
> client. Plan is to add an indication for this in already present 
> ClientRequestId header.
> Following are types of read we want to identify:
>  # Direct Read: Read from a given position in remote file. This will be 
> synchronous read
>  # Normal Read: Read from current seeked position where read ahead was 
> bypassed. This will be synchronous read.
>  # Prefetch Read: Read triggered from background threads filling up in memory 
> cache. This will be asynchronous read.
>  # Missed Cache Read: Read triggered after nothing was received from read 
> ahead. This will be synchronous read.
>  # Footer Read: Read triggered as part of footer read optimization. This will 
> be synchronous.
>  # Small File Read: Read triggered as a part of small file read. This will be 
> synchronous read.
> We will add another field in the Tracing Header (Client Request Id) for each 
> request. We can call this field "Operation Specific Header" very similar to 
> how we have "Retry Header" today. As part of this we will only use it for 
> read operations keeping it empty for other operations. Moving ahead f we need 
> to publish any operation specific info, same header can be used.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to