I know this is an old topic. I'm catching up on months' worth of mailing
list mail right now.
On 09/17/2017 09:09 PM, Christopher Samuel wrote:
On 15/09/17 04:45, Prentice Bisbal wrote:
I'm happy to announce that I finally found the cause this problem: numad.
Very interesting, it sounds like it was migrating processes onto a
single core over time! Anything diagnostic in its log?
That's exactly what it was doing. No, I did not see any diagnostics in
the log files, but in some of the documentation I read on numad at the
time, it stated that numad is not good to have enabled for large
multi-core jobs that use a lot of memory, like DB servers and HPC jobs.
--
Prentice
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf