Chris Samuel wrote: > > In April I wrote: > > > Well we've been gradually replacing the Barcelona chips > > with Shanghai (same clockspeed) and we are yet to see a > > power off on a Shanghai node! > > Since I wrote that we have seen far fewer with 2.3GHz > Shanghai (2376, a 75W part), *but* we have some
some as in: some of the upgraded nodes do this, some do not? > nodes > upgraded to the ULP 2.4 GHz Shanghai (2379 HE, a 55W > part) which do exhibit this issue very regularly! :-( If some of your upgraded nodes do this, and some do not, then this will most likely map to one of: 1. CPU 2. motherboard (all are identical, including BIOS, right?) 3. RAM 4. power supply Start swapping parts between good and bad nodes and pray that it correlates perfectly with the location of one component type. Also keep bugging Supermicro, they should have some idea what is going on. Refresh our memory on this, are you seeing orderly power off (as in a shutdown) or are the nodes just powering down like "boom"? In the latter case I would tend to suspect that the power supply has issues and is triggering an emergency power off to prevent damage from overheating or overload. Swapping the CPUs could make a difference if the newer ones use a bit less power than the older ones. (We had a bunch of PCs which, due to a monster graphics card, were so close to the power supply limit that adding a single fan made the difference between being able to run SpecViewPerf to completion or not - using a lower power CPU would have made the same sort of difference.) Regards, David Mathog mat...@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf