Are you logging something goes to the disk in the local case, but that is competing for network bandwidth when NFS mounting?
On Wed, Sep 13, 2017 at 2:15 PM, Scott Atchley <e.scott.atch...@gmail.com> wrote: > Are you swapping? > > On Wed, Sep 13, 2017 at 2:14 PM, Andrew Latham <lath...@gmail.com> wrote: > >> ack, so maybe validate you can reproduce with another nfs root. Maybe a >> lab setup where a single server is serving nfs root to the node. If you >> could reproduce in that way then it would give some direction. Beyond that >> it sounds like an interesting problem. >> >> On Wed, Sep 13, 2017 at 12:48 PM, Prentice Bisbal <pbis...@pppl.gov> >> wrote: >> >>> Okay, based on the various responses I've gotten here and on other >>> lists, I feel I need to clarify things: >>> >>> This problem only occurs when I'm running our NFSroot based version of >>> the OS (CentOS 6). When I run the same OS installed on a local disk, I do >>> not have this problem, using the same exact server(s). For testing >>> purposes, I'm using LINPACK, and running the same executable with the same >>> HPL.dat file in both instances. >>> >>> Because I'm testing the same hardware using different OSes, this >>> (should) eliminate the problem being in the BIOS, and faulty hardware. This >>> leads me to believe it's most likely a software configuration issue, like a >>> kernel tuning parameter, or some other software configuration issue. >>> >>> These are Supermicro servers, and it seems they do not provide CPU >>> temps. I do see a chassis temp, but not the temps of the individual CPUs. >>> While I agree that should be the first thing I look at, it's not an option >>> for me. Other tools like FLIR and Infrared thermometers aren't really an >>> option for me, either. >>> >>> What software configuration, either a kernel a parameter, configuration >>> of numad or cpuspeed, or some other setting, could affect this? >>> >>> Prentice >>> >>> On 09/08/2017 02:41 PM, Prentice Bisbal wrote: >>> >>>> Beowulfers, >>>> >>>> I need your assistance debugging a problem: >>>> >>>> I have a dozen servers that are all identical hardware: SuperMicro >>>> servers with AMD Opteron 6320 processors. Every since we upgraded to CentOS >>>> 6, the users have been complaining of wildly inconsistent performance >>>> across these 12 nodes. I ran LINPACK on these nodes, and was able to >>>> duplicate the problem, with performance varying from ~14 GFLOPS to 64 >>>> GFLOPS. >>>> >>>> I've identified that performance on the slower nodes starts off fine, >>>> and then slowly degrades throughout the LINPACK run. For example, on a node >>>> with this problem, during first LINPACK test, I can see the performance >>>> drop from 115 GFLOPS down to 11.3 GFLOPS. That constant, downward trend >>>> continues throughout the remaining tests. At the start of subsequent tests, >>>> performance will jump up to about 9-10 GFLOPS, but then drop to 5-6 GLOPS >>>> at the end of the test. >>>> >>>> Because of the nature of this problem, I suspect this might be a >>>> thermal issue. My guess is that the processor speed is being throttled to >>>> prevent overheating on the "bad" nodes. >>>> >>>> But here's the thing: this wasn't a problem until we upgraded to CentOS >>>> 6. Where I work, we use a read-only NFSroot filesystem for our cluster >>>> nodes, so all nodes are mounting and using the same exact read-only image >>>> of the operating system. This only happens with these SuperMicro nodes, and >>>> only with the CentOS 6 on NFSroot. RHEL5 on NFSroot worked fine, and when I >>>> installed CentOS 6 on a local disk, the nodes worked fine. >>>> >>>> Any ideas where to look or what to tweak to fix this? Any idea why this >>>> is only occuring with RHEL 6 w/ NFS root OS? >>>> >>>> >>> _______________________________________________ >>> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing >>> To change your subscription (digest mode or unsubscribe) visit >>> http://www.beowulf.org/mailman/listinfo/beowulf >>> >> >> >> >> -- >> - Andrew "lathama" Latham lath...@gmail.com http://lathama.com >> <http://lathama.org> - >> >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> >> >
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf