Re[3]: Histogram error "Unable to compute ceiling for max when histogram overflowed"

Владимир Бухтояров Thu, 20 Oct 2016 08:12:17 -0700

I have investigated the problem and found that monitoring was serriously 
changed since 3.7(version when I got exception in 
com.codahale.metrics.servlets.MetricsServlet). Since version 3.9 it is enough 
to change behavior of DecayingEstimatedHistogramReservoir, the 
EstimatedHistogram should stay unchanged. The modification of 
DecayingEstimatedHistogramReservoir will be safe, because in opposite to 
EstimatedHistogram, the DecayingEstimatedHistogramReservoir is not used for 
Cassandra internal needs.


Also I found very strange resolution of  issue  CASSANDRA-12185 - the nothing 
done to prevent of IllegalStateException, but issue is closed. Should I reopen 
#12185 or deliver pull request in new issue?


Best regards,
Bukhtoyarov Vladimir
email [email protected]
skype live:fanat-tdd
Github: https://github.com/vladimir-bukhtoyarov
mobile +79618096798

>Среда, 19 октября 2016, 21:12 +03:00 от Владимир Бухтояров 
><[email protected]>:
>
>The null(zero) values of snapshot are useless for problem analysing, because 
>it is impossible to distinguishing case when there are no events from case 
>when events were dispatched too slow. I do not see any criminal to return  
>999-th percentile as 3h when histogram configured with 3h max and any latency 
>is 4h.
>
>
>Best regards,
>Bukhtoyarov Vladimir
>email  [email protected]
>skype live:fanat-tdd
>Github:  https://github.com/vladimir-bukhtoyarov
>mobile  +79618096798
>
>>Среда, 19 октября 2016, 20:17 +03:00 от Ken Hancock < [email protected] 
>>>:
>>
>>I would suggest metrics should return null values instead of false values.
>>
>>On Wed, Oct 19, 2016 at 12:21 PM, Владимир Бухтояров <
>> [email protected] > wrote:
>>
>>>
>>> Hi to all,
>>>
>>> I want to fix  https://issues.apache.org/jira/browse/CASSANDRA-11063
>>> This issue is very ugly for me, because when something works slow then it
>>> is impossible to capture metrics and save it to monitoring database for
>>> future investigation. Moreover when one histogram throw exception then many
>>> metrics-exporters are unable to export metrics for whole MetricRegistry(for
>>> example MetricsServlet), so when overflow happen in one histogram then I
>>> have no history data at all.
>>>
>>> I propose to implement the following changes:
>>> 1. The DecayingEstimatedHistogramReservoir and EstimatedHistogram will
>>> return maximum trackable value instead of Long.MAX_VALUE
>>> 2. The DecayingEstimatedHistogramReservoir and EstimatedHistogram will
>>> never throw IllegalStateException, instead, it will use maximum trackable
>>> value as regular value in percentile and average calculation.
>>> 3.  If anybody want to save old behavior(prefer to crash instead of
>>> inaccurate reporting) then I can add configuration parameter to save
>>> previous behavior, moreover I can leave old behavior as default, for my
>>> needs it will be enough to have some option to avoid crashes.
>>>
>>>
>>> Best regards,
>>> Bukhtoyarov Vladimir
>>> email  [email protected]
>>> skype live:fanat-tdd
>>> Github:  https://github.com/vladimir-bukhtoyarov
>>>
>

Re[3]: Histogram error "Unable to compute ceiling for max when histogram overflowed"

Reply via email to