On 15/12/2015 06:55 μμ, Ian Kelly wrote: > On Tue, Dec 15, 2015 at 10:43 AM, Pavlos Parissis > <[email protected]> wrote: >>> If you want your metrics container to act like a dict, then my >>> suggestion would be to just use a dict, with pseudo-collections for >>> the values as above. >>> >> >> If I understood you correctly, you are saying store all metrics in a >> dict and have a counter key as well to store the times metrics are >> pushed in, and then have a function to do the math. Am I right? > > That would work, although I was actually thinking of something like this: > > class SummedMetric: > def __init__(self): > self.total = 0 > self.count = 0 > > @property > def average(self): > return self.total / self.count > > def add(self, value): > self.total += value > self.count += 1 > > metrics = {} > for metric_name in all_metrics: > metrics[metric_name] = SummedMetric() > > For averaged metrics, look at metrics['f'].average, otherwise look at > metrics['f'].total. >
With this approach I will have for each metric 1 object, which could cause performance issues for my case. Let me bring some context on what I am trying to do here. I want to provide a fast retrieval and processing of statistics metrics for HAProxy. HAProxy exposes stats over a UNIX socket(stats socket). HAProxy is a multi-process daemon and each process can only be accessed by a distinct stats socket. There isn't any shared memory for all these processes. That means that if a frontend or backend is managed by more than one processes, you have to collect metrics from all processes and do the sum or average based on type of the metric. stats are provided in a CSV format: https://gist.github.com/unixsurfer/ba7e3bb3f3f79dcea686 there is 1 line per frontend and backend. For servers is a bit more complicated. When there are 100 lines per process, it is easy to do the work even in setups with 24 processes(24 *100=2.4K lines). But, there are a lot of cases where a stats socket will return 10K lines, due to the amount of backends and servers in backends. This is 240K lines to process and provide stats per 10secs or 5 secs. My plan is to split the processing from the collection. A program will connect to all UNIX sockets asynchronously and dump the CSV to files, one per socket, and group them by EPOCH time. It will dump all files under 1 directory which will have as name the time of the retrieval. Another program in multi-process mode[1], will pick those files and parse them in sequentially to perform the aggregation. For this program I needed the CounterExt. I will try your approach as well as it is very simple and it does the work with fewer lines:-) I will compare both in terms of performance and select the fastest. Thank you very much for your assistance, very much appreciated. [1] pseudo-code from multiprocessing import Process, Queue import pyinotify wm = pyinotify.WatchManager() # Watch Manager mask = pyinotify.IN_CREATE # watched events class EventHandler(pyinotify.ProcessEvent): def __init__(self, queue): self.queue = queue def process_IN_CREATE(self, event): self.queue.put(event.pathname) def work(queue): while True: job = queue.get() if job == 'STOP': break print(job) def main(): pnum = 10 queue = Queue() plist = [] for i in range(pnum): p = Process(target=work, args=(queue,)) p.start() plist.append(p) handler = EventHandler(queue) notifier = pyinotify.Notifier(wm, handler) wdd = wm.add_watch('/tmp/test', mask, rec=True) notifier.loop()
signature.asc
Description: OpenPGP digital signature
-- https://mail.python.org/mailman/listinfo/python-list
