[ 
https://issues.apache.org/jira/browse/HADOOP-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15269925#comment-15269925
 ] 

Mingliang Liu commented on HADOOP-13065:
----------------------------------------

Hi [~cmccabe], I was addressing the initial use case following the newly 
designed API. The two questions baffle me about the v8 patch. 1) How to 
maintain shared op->count storage statistic for all file system objects and 
threads.  I think our use case does not need the per-FileSystem stats like S3A. 
My first idea was to register a single instance to the global storage 
statistics. 2) How to implement the single counter for each operation. As we 
need the atomic increment support among threads, I'm wondering how the 
{{volatile long}} comes into play. I agree with your previous comment that the 
thread local implementation is not ideal for this use case as the RPC call will 
generally dominate the total overhead anyway. If true, an AtomicLong would work 
just fine.

Do you have any quick comments about this? Thanks.

> Add a new interface for retrieving FS and FC Statistics
> -------------------------------------------------------
>
>                 Key: HADOOP-13065
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13065
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Ram Venkatesh
>            Assignee: Mingliang Liu
>         Attachments: HADOOP-13065-007.patch, HADOOP-13065.008.patch, 
> HDFS-10175.000.patch, HDFS-10175.001.patch, HDFS-10175.002.patch, 
> HDFS-10175.003.patch, HDFS-10175.004.patch, HDFS-10175.005.patch, 
> HDFS-10175.006.patch, TestStatisticsOverhead.java
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like MapReduce can expose them as additional counters to be 
> aggregated and recorded as part of job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to