On 2/6/2014 9:30 AM, Aaron Knister wrote:
Bill Wichser <bill <at> princeton.edu> writes:
We have tested using c1 instead of c0 but no difference. We don't use
logical processors at all. When the problems happens, it doesn't matter
what you set the cores for C1/C0, they never get up to speed again
without a power cycle/reseat. We believe this to be something related
to power. Maybe current limiting.
As I stated yesterday, after a complete chassis power cycle on Tuesday
Sept 10, the entire 37 chassis have been outperforming their 2.6GHz
ratings flawlessly. I don't know if this is going to be the solution we
have been searching to find but it has certainly been a week and a half
of some very happy researchers!
Thanks,
Bill
On 09/19/2013 11:32 AM, Christopher Samuel wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 18/09/13 10:49, Douglas O'Flaherty wrote:
"Run in C1. C0 over commits unpredictably, then throttles."
I've seen a recommendation in a public Mellanox document of using C1
not C0 when using hyperthreading/SMT, could be related to this..
- --
Christopher Samuel Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: samuel <at> unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iEYEARECAAYFAlI7GPAACgkQO2KABBYQAh9zPQCfeOCdUupjqx7SDeFxQjBWG9NU
FL4AnRYA3zLCNzEVNp0ypiW9KMYp3ohW
=ntfO
-----END PGP SIGNATURE-----
_______________________________________________
Beowulf mailing list, Beowulf <at> beowulf.org sponsored by Penguin
Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
Hi Bill,
I'm wondering if this issue has resurfaced for you after the firmware
updates and chassis power cycles?
I'm having what sounds to be the same issue but with R320's. So far
BIOS/iDRAC/Lifecycle controller updates haven't helped but I haven't tried
physically removing power to the node. I have been doing using the "ipmitool
power cycle" command to reboot the nodes and get them out of their funk
(running at 0.2GHz) but that, of course, still leaves part of the chassis
energized.
Thanks!
-Aaron
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
Aaron,
The problem has not resurfaced after the updates and power cycling of
the chassis themselves. Just doing the nodes never did help as the
firmware in the chassis itself was the one which needed the power cycle.
Bill
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf