gortiz commented on PR #11695:
URL: https://github.com/apache/pinot/pull/11695#issuecomment-1738615579

   > Thanks for raising this and providing the readings! Curious how you decide 
to configure the sliding window to be 15 minutes? What is the side effect of 
that?
   
   I've just decided to use 15 mins because our histograms and timers provide 
several windowed rate values like oneMinuteRate, fiveMinuteRate and 
fifteenMinuteRate, being the last the longer one. In order to actually return a 
correct value for that rate we need to store at least the results during the 
last 15 mins.
   
   The side effect is the memory the metric requires. 
`SlidingTimeWindowArrayReservoir` requires 128 bits per value stored. Therefore 
each histogram will require:
   
   ```
   Size = metric_frequency * time_window_size * 128 / 8 Bytes
   ```
   
   Which means that
   
   | measures per second  |  time window (mins) |  size (MBs)  |
   -- | -- | --
   10 | 1 | 0.0096
   10 | 5 | 0.048
   10 | 15 | 0.144
   100 | 1 | 0.096
   100 | 5 | 0.48
   100 | 15 | 1.44
   1000 | 1 | 0.96
   1000 | 5 | 4.8
   1000 | 15 | 14.4
   10000 | 1 | 9.6
   10000 | 5 | 48
   100000 | 15 | 1440
   
   This is one of the problems `SlidingTimeWindowArrayReservoir` implementation 
has. The size on heap depends on the measures per second. In case they are not 
controlled (as it may happen in Pinot), its footprint does not have an upper 
bound.
   
   HdrHistogram doesn't have this problem. Instead in HdrHistogram you define 
the min and max expected values (which may be also problematic in our case) and 
the precision you want to have. [From HdrHistogram 
documentation](https://github.com/HdrHistogram/HdrHistogram/blob/master/README.md):
   
   > For example, a Histogram could be configured to track the counts of 
observed integer values between 0 and 3,600,000,000 while maintaining a value 
precision of 3 significant digits across that range. Value quantization within 
the range will thus be no larger than 1/1,000th (or 0.1%) of any value. This 
example Histogram could be used to track and analyze the counts of observed 
response times ranging between 1 microsecond and 1 hour in magnitude, while 
maintaining a value resolution of 1 microsecond up to 1 millisecond, a 
resolution of 1 millisecond (or better) up to one second, and a resolution of 1 
second (or better) up to 1,000 seconds. At its maximum tracked value (1 hour), 
it would still maintain a resolution of 3.6 seconds (or better).
   
   What do you think? Should we decrease the `SlidingTimeWindowArrayReservoir` 
time window to something like 5 mins? Should we add HdrHistogram? In the latter 
case it would be nice to add new methods to our metric registry to let calling 
code configure the min and max values.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to