[ 
https://issues.apache.org/jira/browse/HADOOP-16830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17130954#comment-17130954
 ] 

Steve Loughran commented on HADOOP-16830:
-----------------------------------------

[~lucacanali] I've done an iteration on a variant designed to support and 
aggregate different types

having written the new extensible design, I've decided I don't like it. It is 
too complex as I'm trying to support arbitrary arity tuples of any kind of 
statistic.it makes iterating/parsing this stuff way too complext

here's a better idea: we only support a limited set; 

* counter: long
* min; long
* max: long
* mean (double, long)
* gauge; long

# all but gauge have simple aggregation, for gauge i'll add stuff up too, on 
the assumption that they will be positive values (e.g 'number of active reads')
# and every set will have its own iterator.

what do people think? I can do an iteration fairly quickly

> Add public IOStatistics API; S3A to support
> -------------------------------------------
>
>                 Key: HADOOP-16830
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16830
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs, fs/s3
>    Affects Versions: 3.3.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>
> Applications like to collect the statistics which specific operations take, 
> by collecting exactly those operations done during the execution of FS API 
> calls by their individual worker threads, and returning these to their job 
> driver
> * S3A has a statistics API for some streams, but it's a non-standard one; 
> Impala &c can't use it
> * FileSystem storage statistics are public, but as they aren't cross-thread, 
> they don't aggregate properly
> Proposed
> # A new IOStatistics interface to serve up statistics
> # S3A to implement
> # other stores to follow
> # Pass-through from the usual wrapper classes (FS data input/output streams)
> It's hard to think about how best to offer an API for operation context 
> stats, and how to actually implement.
> ThreadLocal isn't enough because the helper threads need to update on the 
> thread local value of the instigator
> My Initial PoC doesn't address that issue, but it shows what I'm thinking of



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to