Re: [Beowulf] How to debug slow compute node?

2017-08-13 Thread Christopher Samuel
On 12/08/17 17:35, William Johnson wrote: > This may be a long shot, especially in a server room where everything > else is working as expected. Oh agreed! But given people have covered a lot of other bases I thought I'd throw something in from my own experience. If all nodes boot the same OS im

Re: [Beowulf] How to debug slow compute node?

2017-08-13 Thread Christopher Samuel
On 14/08/17 08:17, Lachlan Musicman wrote: > Can you point to some good documentation on this? There is some on Mellanox's website: http://www.mellanox.com/related-docs/prod_software/Mellanox_EN_for_Linux_User_Manual_v2_0-3_0_0.pdf But it it took weeks for $VENDOR to figure out what was going o

Re: [Beowulf] How to debug slow compute node?

2017-08-13 Thread Lachlan Musicman
On 12 August 2017 at 13:35, Chris Samuel wrote: > Also remember that the kernel can enable C states that hurt performance > even > if they are disabled in the BIOS/UEFI. This was painfully apparent on our > first SandyBridge cluster that almost failed the performance part of > acceptance > test