Wow, that's a tricky one.  quite honestly colmux has been so solid for me I
haven't looked at the code in ages, but that doesn't mean anything either.
It's also amusing to note I had totally forgotten it supported the hostname
address syntax you're using.  ;)  That allowed me to essentially use the
same command you are, with one note.  I also added -test and see columns 10
and 20 are different than you're saying.  maybe you have a different
kernel?  I'm on 4.4.7-1-amd64-hpelinux which is the linux we use for our
Helion Cloud and is essentially debian as well.

stack@cd-cp1-c1-m1-mgmt:~$ ~/colmux.pl -addr cd-cp1-swobj000[1-3]-mgmt
-command "-sC -oT -P" -cols 10,20

         [CPU:0]Idle%                  [CPU:1]Soft%
#Time    1-mgmt 2-mgmt 3-mgmt |  1-mgmt 2-mgmt 3-mgmt
12:08:27     -1     -1     -1 |      -1     -1     -1
12:08:28     -1     -1     -1 |      -1     -1     -1
12:08:29     95     -1    100 |       0     -1      0
12:08:30     95     97     98 |       0      0      0
12:08:31     97    100    100 |       0      0      0
12:08:32     87    100     89 |       0      0      0
12:08:33    100    100    100 |       0      0      0
12:08:34    100    100     99 |       0      0      0
12:08:35    100     97     97 |       0      0      0
12:08:36     99     98    100 |       0      0      0

What you didn't say is does this fail all the time or intermittently.  If
intermittent it will indeed be hard to track down, but there is hope too ;)

Have you tried playing back a file with colmux yet?  If not, you can simply
rerun the command but include -p and point it to the raw files.  The one
thing I did discover is I think I introduced a bug some time in the past
and you need to have the hostname portion of the string start with a wild
card rather than anywhere in the middle.  And then to make matters worse I
found a second bug and am using the wrong column during playback.  more
digging into that required too.  ;(

BUT if I add 1 to each column I think this looks right if you ignore what
the headers say:

stack@cd-cp1-c1-m1-mgmt:~$ ~/colmux.pl -addr cd-cp1-swobj000[1-3]-mgmt
-command "-sC -oT -P -p
'/var/cache/collectl/*-mgmt-20160616-110000.raw.gz'" -cols 11,21|more

         [CPU:0]Totl%                  [CPU:1]Steal%
#Time    1-mgmt 2-mgmt 3-mgmt |  1-mgmt 2-mgmt 3-mgmt
     99     99    100 |       0      0      0
     98     99     97 |       0      0      0
     94     98     94 |       0      0      0
     94     93     92 |       0      0      0
     99     94     98 |       0      0      0
     99    100     99 |       0      0      0
     99    100    100 |       0      0      0

and since this is a playback command, you can use time ranges as well to
limit what is being displayed so I may help zero in on where in the data
the problem is and then maybe even send me a subset of the problem raw file
[use collectl --extract to create a new raw from from the time slice of an
old one].  then, maybe I can track down why this is happening.

-mark






On Wed, Jun 15, 2016 at 8:35 PM, Hernan Laffitte <[email protected]>
wrote:

> Hello,
>
> We are trying to gather detailed CPU usage from a number of machines in
> our cluster. In particular, we want to see usage of every individual CPU in
> a group of machines.
>
> With collectl, on a single machine, the command we can run is:
>
>    collectl -sC -oT -P
>
> Which gives us 282 columns (the machines have 28 CPU's).
>
> Now we want to run a colmux command to see the idle time of CPU's 0 and 1
> on 3 machines. This is columns 10 and 20 ("[CPU:0]Idle%" and
> "[CPU:1]Idle%"). The command we use is:
>
>    colmux -addr 'machine-[1-3]' -command "-sC -oT -P" -cols 10,20
>
> This generates the error:
>
>    Minute '60' out of range 0..59 at /usr/bin/colmux line 1699.
>
> The error occurs when parsing the field "lasttime" of a data structure
> $hostVars, which has the following content at the time of the error:
>
> {
>           'lasttime' => [
>                           '',
>                           '20160615'
>                         ],
>           'maxinst' => [
>                          -1,
>                          0
>                        ],
>           'lastinst' => [
>                           -1,
>                           0
>                         ],
>           'bufptr' => 1
> };
>
> I am currently running version "collectl V3.6.9-1
> (zlib:2.06,HiRes:1.9725)" on Debian. Any idea of what may be the problem
> here?
>
>
> Thanks in advance,
>
> Hernan
>
>
>
> ------------------------------------------------------------------------------
> What NetFlow Analyzer can do for you? Monitors network bandwidth and
> traffic
> patterns at an interface-level. Reveals which users, apps, and protocols
> are
> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
> J-Flow, sFlow and other flows. Make informed decisions using capacity
> planning
> reports.
> http://pubads.g.doubleclick.net/gampad/clk?id=1444514421&iu=/41014381
> _______________________________________________
> Collectl-interest mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/collectl-interest
>
>
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports. http://pubads.g.doubleclick.net/gampad/clk?id=1444514421&iu=/41014381
_______________________________________________
Collectl-interest mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/collectl-interest

Reply via email to