Normally I would suggest to do a diagnostic read dd from each disk, but you may not be able to do that with your RAID controller since it hides the individual disks.
My next recommendation would be a full AC cycle; can you power the host off for a few minutes? It's a bit cargo cult-y but sometimes it works. It may also help (or not) for you to spin around 3 times while the machine is off. On Mon, Mar 19, 2018 at 2:03 PM, David Mathog <mat...@caltech.edu> wrote: > On 19-Mar-2018 13:58, David Mathog wrote: > >> The only oddness of late on "B" is that a few days ago it loaded too >> many memory hungry processes so the OS killed some. I have had that >> happen before on other systems without them doing anything odd >> afterwards. >> > > Sorry, hit return to soo. > > The /var/log/messages entries associated with that showed OOM only killed > some user processes, no system processes were removed. > > Regards, > > > David Mathog > mat...@caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf >
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf