Re: [Beowulf] LSI Megaraid stalls system on very high IO?

Gregory Matthews Mon, 18 Aug 2014 05:20:53 -0700

On 16/08/14 08:46, Jörg Saßmannshausen wrote:

My problem: I got some old PCI-X LSI SCSI cards which are connected to some
Infortrend storage boxes. We recently had a power-dip (lights went off and came
back within 2 sec) and now the 10 year old frontend is playing up. So I need a
new frontend and it seems very difficutl to get a PCI-e to PCI-X riser card so I
can get a newer motherboard with more cores and more memory.

good luck with that! Those technologies are pretty incompatible. Thereare one or two PCIe (x1) to PCI (maybe compatible with PCI-X - checkvoltages etc.) converters but I wouldn't trust them with my storage.

The last server we bought that was still compatible with PCI-X was aDell Poweredge R200, you needed to specify PCI-X riser when buying.Maybe ebay is your best bet at this point?


GREG


Hence the thread was good for me to read as I hopefully can configure the
frontend a bit better.

If somebody got any comments on my problem feel free to reply.

David: By the looks of it you will compress larger files on a regular base.
Have you considered using the parallel version of gzip? Per default it is
using all available cores but you can change that in the command line. That
way you might avoid the problem with disc I/O and simply use the available
cores. You also could do a 'nice' to make sure the machine does not become
unresponsive due to high CPU load. Just an idea to speed up your
decompressions.

All the best from a sunny London

Jörg


On Freitag 15 August 2014 Dimitris Zilaskos wrote:

Hi,

I hope your issue has been resolved meanwhile. I had a somehow similar
mixed experience with Dell branded LSI controllers. It would appear
that some models are just not fit for particular workloads. I have put
some information in our blog at
http://www.gridpp.rl.ac.uk/blog/2013/06/14/lsi-1068e-issues-understood-and-
resolved/

Cheers,

Dimitris

On Thu, Jul 31, 2014 at 7:37 PM, mathog <[email protected]> wrote:

Any pointers on why a system might appear to "stall" on very high IO
through an LSI megaraid adapter?  (dm_raid45, on RHEL 5.10.)

I have been working on another group's big Dell server, which has 16
CPUs, 82 GB of memory, and 5 1TB disks which go through an LSI Megaraid
(not sure of the exact configuration and their system admin is out sick)
and show up as /dev/sda[abc], where the first two are just under 2 TB
and the third is /boot and is about 133 Gb.  sda and sdb are then
combined through lvm into one big volume and that is what is mounted.

Yesterday on this system when I ran 14 copies of this simultaneously:
   # X is 0-13
   gunzip -c bigfile${X}.gz > resultfile${X}

the first time, part way through, all of my terminals locked up for
several minutes, and then recovered.  Another similar command had the
same issue about half an hour later, but others between and since did
not stall.  The size of the files unpacked is only about 0.5Gb, so even
if the entire file was stored in memory in the pipes all 14 should have
fit in main memory. Nothing else was running (at least that I noticed
before or after, something might have started up during the run and
ended before I could look for it.) During this period the system would
still answer pings.  Nothing showed up in /var/log/messages or dmesg,
"last" showed nobody else had logged in, and overnight runs of "smartctl
-t long" on the 5 disks were clean - nothing pending, no reallocation
events.

Today ran the first set of commands again with "nice 10" and had "top"
going and nothing untoward was observed and there were no stalls. On
that run iostat showed:

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda            6034.00         0.00    529504.00          0     529504
sda5           6034.00         0.00    529504.00          0     529504
dm-0          68260.00      2056.00    546008.00       2056     546008


So why the apparent stalls yesterday?  It felt like either my interactive
processes were swapped out or they had a much lower priority than enough
other processes so that they were not getting any CPU time. Is there some
sort of housekeeping that the Megaraid, LVM, or anything normally
installed with RHEL 5.10, might need to do, from time to time, that
would account for these stalls?

Thanks,

David Mathog
[email protected]
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf



--
Greg Matthews        01235 778658
Scientific Computing Group Leader
Diamond Light Source Ltd. OXON UK


--
This e-mail and any attachments may contain confidential, copyright and or 
privileged material, and are for the use of the intended addressee only. If you 
are not the intended addressee or an authorised recipient of the addressee 
please notify us of receipt by returning the e-mail and do not use, copy, 
retain, distribute or disclose the information in or attached to the e-mail.

Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd.Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.

Diamond Light Source Limited (company no. 4375679). Registered in England and 
Wales with its registered office at Diamond House, Harwell Science and 
Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom




_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] LSI Megaraid stalls system on very high IO?

Reply via email to