On Sat, 14 Dec 2019 at 04:25, Lech Nieroda <lech.nier...@uni-koeln.de> wrote:
[OK, so I'm a bit lagged finding this] > I’ve been tinkering with the acct_gather_profile/influxdb plugin a bit in > order to visualize the cpu and memory usage of live jobs. > Both the influxdb backend and Grafana dashboards seem like a perfect fit for > our needs. Ditto - I've been working on dashboards for jobcomp/elasticsearch too, (I'll push it to grafana.com once it looks "shiny" and useful) as we use collectd/influxdb/grafana for most of our node monitoring. [snip] "value = NNN" is a pain when you're trying to plot these. > So a single „series" would be: > Measurement: acct_gather_profile_task Tags: job, step, task, host Fields: > CPUTime, CPUUtilization, CPUFrequency, RSS, VMSize, Pages, ReadMB, WriteMB > Timestamp YES! Make it so! much more efficient and you can add any qualifiers (floats/ints) as needed as per https://docs.influxdata.com/influxdb/v1.7/write_protocols/line_protocol_reference/ sounds like a good plan - I've been testing this just now and agree about crap schema design [root@alfred ~]# influx Connected to http://localhost:8086 version 1.7.8 InfluxDB shell version: 1.7.8 > use slurm Using database slurm > show measurements name: measurements name ---- CPUFrequency CPUTime CPUUtilization Pages RSS ReadMB VMSize WriteMB > select * from CPUUtilization name: CPUUtilization time host job step task value ---- ---- --- ---- ---- ----- 1582364991000000000 client1 662 -2 0 0 1582365021000000000 client1 662 -2 0 0 1582365051000000000 client1 662 -2 0 0 1582365081000000000 client1 662 -2 0 0 1582365352000000000 client1 663 0 0 0 1582365382000000000 client1 663 0 0 99.8 1582365412000000000 client1 663 0 0 99.87 1582365442000000000 client1 663 0 0 99.83 1582365472000000000 client1 663 0 0 98.73 Out of interest, what retention policy are you using for profile data? Andrew