I just checked in with some colleagues and was reminded what I forgot - the
way IB deals with numbers is really poor!  they use narrow, non-wrapping
counters so if you're running at high rates and don't collect them
frequently enough, you'll lose data.  The faster the IB, the faster to have
to read them.  At FDR speeds, you'll need to read the counters every 2-3
seconds or you'll lose data!  What rate are you collecting at?  The default
of 10 seconds?  Try running interactively with -i10 and you'll probably
lose there too.  ;(

The 'smoking gun' in your data seems to be the packet rates which are
reported correctly since they're smaller numbers.

-mark


On Tue, Nov 26, 2013 at 8:34 AM, Mark Seger <[email protected]> wrote:

> re playback: can you try it out?  even if only for a little bit?  I'm
> guessing you'll see the problem there as well but I'd really like to
> understand what is happening.  All you need to do is add "--rawtoo -f/tmp"
> to the DaemonCommands in your /etc/collectl.conf, restart collectl and it
> will write a raw file to /tmp.  Then, if you can run your tests and save
> the interactive output with timestamps - include -oT, you should be able to
> play back the data with 'collectl -p file -sx -oT' from the raw file and
> see almost identical numbers OR 1/2 the values.  Just remember to reset
> /etc/collectl.conf when you're done.
>
> It would provide a useful data point.  You can even play back the data
> with --export lexpr.
>
> Meanwhile we can try to reproduce what you're seeing.  Actually have you
> seen this with earlier versions of collectl?  I haven't touch the IB code
> in years, at least I don't remember doing so, but I have touched lexpr,
> that's why it's important to try and understand where the actual problem
> lies.
>
> -mark
>
>
> On Tue, Nov 26, 2013 at 8:09 AM, Dragseth Roy Einar 
> <[email protected]>wrote:
>
>>  Hi Mark.  Yes, its been a while...
>>
>> I must admit I have never used playback mode so I do not know.  We do not
>> have any .raw files produced by collectl.
>>
>> r.
>>
>>  ------------------------------
>> *From:* Mark Seger [[email protected]]
>> *Sent:* Tuesday, November 26, 2013 1:52 PM
>> *To:* Dragseth Roy Einar
>> *Cc:* [email protected]
>> *Subject:* Re: [Collectl-interest] collectl disagrees with itself
>> regarding infiniband bandwidth.
>>
>>   hi roy - long time no chat...
>>
>>  This is indeed an interesting one I haven't seen.  Just to be clear,
>> because you said it reports half as a daemon when using lexpr.  Does it
>> also record 1/2 as a daemon and playback as 1/2 w/o lexpr?
>>
>>  -mark
>>
>>
>> On Tue, Nov 26, 2013 at 4:01 AM, Roy Dragseth <[email protected]>wrote:
>>
>>> Collectl seems to disagree with itself when reporting infiniband
>>> bandwidth
>>> usage.
>>>
>>> I'm running a bandwidth benchmark that reports appr. 7 GB/s bidirectional
>>> bandwidth on our QDR infiniband network:
>>>
>>> Benchmark exchange(MPI_Sendrecv)
>>> ================================
>>>         lenght     iterations   elapsed time  transfer rate
>>>  latency
>>>        (bytes)        (count)      (seconds)     (Mbytes/s)
>>> (usec)
>>>
>>> --------------------------------------------------------------------------
>>>       12582912           8578         30.626         7048.6
>>> 1785.2
>>>
>>>
>>> Running collectl interactively shows approximately the same
>>>
>>> [root@c10-13 etc]# collectl -s x
>>> Couldn't find 'ofed_info'.  Won't be able to determine OFED version
>>> waiting for 1 second sample...
>>> #<-----------InfiniBand----------->
>>> #   KBIn  PktIn   KBOut PktOut Errs
>>>  3472553  1717K 3472483  1717K    0
>>>  3472962  1717K 3472977  1717K    0
>>>  3472570  1717K 3472629  1717K    0
>>>  3470588  1716K 3470598  1716K    0
>>>  3472094  1717K 3472105  1717K    0
>>>  3471221  1716K 3471156  1716K    0
>>>  3472378  1717K 3472409  1717K    0
>>>
>>> But if I run it as a daemon, with this addition to DaemonCommands in
>>> collectl.conf, -P --export lexpr,f=/tmp/L, (*) it only reports half the
>>> bandwidth usage
>>>
>>> [root@c10-13 etc]# grep iconnect /tmp/L
>>> iconnect.kbin 1677721
>>> iconnect.pktin 1722455
>>> iconnect.kbout 1677721
>>> iconnect.pktout 1722455
>>>
>>>
>>> Is this a bug?  Any workarounds?
>>> The test was done with collectl 3.6.9.
>>>
>>>
>>> * I use this to report infiniband traffic in ganglia,
>>>
>>> https://wiki.rocksclusters.org/wiki/index.php/Roy_Dragseth#Integrating_collectl_and_ganglia
>>>
>>>
>>>
>>>
>>> --
>>>
>>>   The Computer Center, University of Tromsø, N-9037 TROMSØ Norway.
>>>               phone:+47 77 64 41 07, fax:+47 77 64 41 00
>>>         Roy Dragseth, Team Leader, High Performance Computing
>>>          Direct call: +47 77 64 62 56. email: [email protected]
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Shape the Mobile Experience: Free Subscription
>>> Software experts and developers: Be at the forefront of tech innovation.
>>> Intel(R) Software Adrenaline delivers strategic insight and game-changing
>>> conversations that shape the rapidly evolving mobile landscape. Sign up
>>> now.
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Collectl-interest mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/collectl-interest
>>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Shape the Mobile Experience: Free Subscription
>> Software experts and developers: Be at the forefront of tech innovation.
>> Intel(R) Software Adrenaline delivers strategic insight and game-changing
>> conversations that shape the rapidly evolving mobile landscape. Sign up
>> now.
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Collectl-interest mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/collectl-interest
>>
>>
>
------------------------------------------------------------------------------
Shape the Mobile Experience: Free Subscription
Software experts and developers: Be at the forefront of tech innovation.
Intel(R) Software Adrenaline delivers strategic insight and game-changing 
conversations that shape the rapidly evolving mobile landscape. Sign up now. 
http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk
_______________________________________________
Collectl-interest mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/collectl-interest

Reply via email to