Wow, that's a tricky one. quite honestly colmux has been so solid for me I
haven't looked at the code in ages, but that doesn't mean anything either.
It's also amusing to note I had totally forgotten it supported the hostname
address syntax you're using. ;) That allowed me to essentially use the
same command you are, with one note. I also added -test and see columns 10
and 20 are different than you're saying. maybe you have a different
kernel? I'm on 4.4.7-1-amd64-hpelinux which is the linux we use for our
Helion Cloud and is essentially debian as well.
stack@cd-cp1-c1-m1-mgmt:~$ ~/colmux.pl -addr cd-cp1-swobj000[1-3]-mgmt
-command "-sC -oT -P" -cols 10,20
[CPU:0]Idle% [CPU:1]Soft%
#Time 1-mgmt 2-mgmt 3-mgmt | 1-mgmt 2-mgmt 3-mgmt
12:08:27 -1 -1 -1 | -1 -1 -1
12:08:28 -1 -1 -1 | -1 -1 -1
12:08:29 95 -1 100 | 0 -1 0
12:08:30 95 97 98 | 0 0 0
12:08:31 97 100 100 | 0 0 0
12:08:32 87 100 89 | 0 0 0
12:08:33 100 100 100 | 0 0 0
12:08:34 100 100 99 | 0 0 0
12:08:35 100 97 97 | 0 0 0
12:08:36 99 98 100 | 0 0 0
What you didn't say is does this fail all the time or intermittently. If
intermittent it will indeed be hard to track down, but there is hope too ;)
Have you tried playing back a file with colmux yet? If not, you can simply
rerun the command but include -p and point it to the raw files. The one
thing I did discover is I think I introduced a bug some time in the past
and you need to have the hostname portion of the string start with a wild
card rather than anywhere in the middle. And then to make matters worse I
found a second bug and am using the wrong column during playback. more
digging into that required too. ;(
BUT if I add 1 to each column I think this looks right if you ignore what
the headers say:
stack@cd-cp1-c1-m1-mgmt:~$ ~/colmux.pl -addr cd-cp1-swobj000[1-3]-mgmt
-command "-sC -oT -P -p
'/var/cache/collectl/*-mgmt-20160616-110000.raw.gz'" -cols 11,21|more
[CPU:0]Totl% [CPU:1]Steal%
#Time 1-mgmt 2-mgmt 3-mgmt | 1-mgmt 2-mgmt 3-mgmt
99 99 100 | 0 0 0
98 99 97 | 0 0 0
94 98 94 | 0 0 0
94 93 92 | 0 0 0
99 94 98 | 0 0 0
99 100 99 | 0 0 0
99 100 100 | 0 0 0
and since this is a playback command, you can use time ranges as well to
limit what is being displayed so I may help zero in on where in the data
the problem is and then maybe even send me a subset of the problem raw file
[use collectl --extract to create a new raw from from the time slice of an
old one]. then, maybe I can track down why this is happening.
-mark
On Wed, Jun 15, 2016 at 8:35 PM, Hernan Laffitte <[email protected]>
wrote:
> Hello,
>
> We are trying to gather detailed CPU usage from a number of machines in
> our cluster. In particular, we want to see usage of every individual CPU in
> a group of machines.
>
> With collectl, on a single machine, the command we can run is:
>
> collectl -sC -oT -P
>
> Which gives us 282 columns (the machines have 28 CPU's).
>
> Now we want to run a colmux command to see the idle time of CPU's 0 and 1
> on 3 machines. This is columns 10 and 20 ("[CPU:0]Idle%" and
> "[CPU:1]Idle%"). The command we use is:
>
> colmux -addr 'machine-[1-3]' -command "-sC -oT -P" -cols 10,20
>
> This generates the error:
>
> Minute '60' out of range 0..59 at /usr/bin/colmux line 1699.
>
> The error occurs when parsing the field "lasttime" of a data structure
> $hostVars, which has the following content at the time of the error:
>
> {
> 'lasttime' => [
> '',
> '20160615'
> ],
> 'maxinst' => [
> -1,
> 0
> ],
> 'lastinst' => [
> -1,
> 0
> ],
> 'bufptr' => 1
> };
>
> I am currently running version "collectl V3.6.9-1
> (zlib:2.06,HiRes:1.9725)" on Debian. Any idea of what may be the problem
> here?
>
>
> Thanks in advance,
>
> Hernan
>
>
>
> ------------------------------------------------------------------------------
> What NetFlow Analyzer can do for you? Monitors network bandwidth and
> traffic
> patterns at an interface-level. Reveals which users, apps, and protocols
> are
> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
> J-Flow, sFlow and other flows. Make informed decisions using capacity
> planning
> reports.
> http://pubads.g.doubleclick.net/gampad/clk?id=1444514421&iu=/41014381
> _______________________________________________
> Collectl-interest mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/collectl-interest
>
>
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are
consuming the most bandwidth. Provides multi-vendor support for NetFlow,
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports. http://pubads.g.doubleclick.net/gampad/clk?id=1444514421&iu=/41014381
_______________________________________________
Collectl-interest mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/collectl-interest