[ 
https://issues.apache.org/jira/browse/HADOOP-13453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15996530#comment-15996530
 ] 

Steve Loughran commented on HADOOP-13453:
-----------------------------------------

Why not take that list, create a new JIRA off HADOOP-13345 "add more s3guard 
metrics" and suggest those as the start.

One interesting one to see if we could detect would be mismatches between 
s3guard and the underlying object store: if we can observe inconsistencies 
(how?) then that should be measured. The S3mper blog posts looks at how netflix 
detected consistency issues in S3 that way

> S3Guard: Instrument new functionality with Hadoop metrics.
> ----------------------------------------------------------
>
>                 Key: HADOOP-13453
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13453
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Chris Nauroth
>            Assignee: Ai Deng
>             Fix For: HADOOP-13345
>
>         Attachments: HADOOP-13453-HADOOP-13345-001.patch, 
> HADOOP-13453-HADOOP-13345-002.patch, HADOOP-13453-HADOOP-13345-003.patch, 
> HADOOP-13453-HADOOP-13345-004.patch, HADOOP-13453-HADOOP-13345-005.patch
>
>
> Provide Hadoop metrics showing operational details of the S3Guard 
> implementation.
> The metrics will be implemented in this ticket:
> ● S3GuardRechecksNthPercentileLatency (MutableQuantiles) ­​ Percentile time 
> spent
> in rechecks attempting to achieve consistency. Repeated for multiple 
> percentile values
> of N.  This metric is an indicator of the additional latency cost of running 
> S3A with
> S3Guard.
> ● S3GuardRechecksNumOps (MutableQuantiles) ­​ Number of times a consistency
> recheck was required while attempting to achieve consistency.
> ● S3GuardStoreNthPercentileLatency (MutableQuantiles) ­​ Percentile time 
> spent in
> operations against the consistent store, including both write operations 
> during file system
> mutations and read operations during file system consistency checks. Repeated 
> for
> multiple percentile values of N. This metric is an indicator of latency to 
> the consistent
> store implementation.
> ● S3GuardConsistencyStoreNumOps (MutableQuantiles) ­​ Number of operations
> against the consistent store, including both write operations during file 
> system mutations
> and read operations during file system consistency checks.
> ● S3GuardConsistencyStoreFailures (MutableCounterLong) ­​ Number of failures
> during operations against the consistent store implementation.
> ● S3GuardConsistencyStoreTimeouts (MutableCounterLong) ­​ Number of timeouts
> during operations against the consistent store implementation.
> ● S3GuardInconsistencies (MutableCounterLong) ­ C​ ount of times S3Guard 
> failed to
> achieve consistency, even after exhausting all rechecks. A high count may 
> indicate
> unexpected out­of­band modification of the S3 bucket contents, such as by an 
> external
> tool that does not make corresponding updates to the consistent store.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to