Re: [Beowulf] RHEL5 network throughput/scalability

2008-06-13 Thread Perry E. Metzger
A number of these seem rather odd, or unrelated to performance. Walid <[EMAIL PROTECTED]> writes: > It is lame, however i managed to get the following kernel paramter to scale > well in terms of both performance per node, and scalability over a high > bandwidth low latency network > > net.ipv4.t

Re: [Beowulf] User resource limits

2008-06-13 Thread Lombard, David N
On Fri, Jun 13, 2008 at 04:03:49PM -0400, Prentice Bisbal wrote: > Lombard, David N wrote: > > On Mon, Jun 09, 2008 at 11:41:29AM -0400, Prentice Bisbal wrote: > >> I would like to impose some CPU and memory limits on users that are hard > >> limits that can't be changed/overridden by the users. Wh

Re: [Beowulf] User resource limits

2008-06-13 Thread Prentice Bisbal
Lombard, David N wrote: > On Mon, Jun 09, 2008 at 11:41:29AM -0400, Prentice Bisbal wrote: >> I would like to impose some CPU and memory limits on users that are hard >> limits that can't be changed/overridden by the users. What is the best >> way to do this? All I know is environment variables or

Re: [Beowulf] RHEL5 network throughput/scalability

2008-06-13 Thread Walid
Dear All, It is lame, however i managed to get the following kernel paramter to scale well in terms of both performance per node, and scalability over a high bandwidth low latency network net.ipv4.tcp_workaround_signed_windows = 1 net.ipv4.tcp_congestion_control = vegas net.ipv4.tcp_tso_win_di

Re: [Beowulf] Infiniband modular switches

2008-06-13 Thread Patrick Geoffray
Hi Don, Don Holmgren wrote: latency difference here matters to many codes). Perhaps of more significance, though, is that you can use oversubscription to lower the cost of your fabric. Instead of connecting 12 ports of a leaf switch to nodes and using the other 12 ports as uplinks, you might

Re: [Beowulf] Infiniband modular switches

2008-06-13 Thread Don Holmgren
On Fri, 13 Jun 2008, Ramiro Alba Queipo wrote: On Thu, 2008-06-12 at 10:08 -0500, Don Holmgren wrote: Ramiro - You might want to also consider buying just a single 24-port switch for your 22 nodes, and then when you expand either replace with a larger switch, or build a distributed switch fab

Re[4]: [Beowulf] Infiniband modular switches

2008-06-13 Thread Jan Heichler
Hallo Ramiro, Freitag, 13. Juni 2008, meintest Du: RAQ> On Fri, 2008-06-13 at 17:55 +0200, Jan Heichler wrote: >> You can use the 24-port switches to create a full bisectional >> bandwidth network if you want that. Since all the big switches are >> based on the 24-port silicon this is no problem

Re: [Beowulf] RHEL5 network throughput/scalability

2008-06-13 Thread Walid
2008/6/13 Jason Clinton <[EMAIL PROTECTED]>: > > We've seen fairly erratic behavior induced by newer drivers for NVidia > NForce-based NIC's with forcedeth. If that's your source NIC in the above > scenario, that could be the source of the issue as congestion timing has > probably changed. Have yo

Re: Re[2]: [Beowulf] Infiniband modular switches

2008-06-13 Thread Ramiro Alba Queipo
On Fri, 2008-06-13 at 17:55 +0200, Jan Heichler wrote: > Hallo Ramiro, > > > RAQ> The alternatives are: > > > RAQ> a) Start with a good 24 port swith and grow up loosing latency > and > > RAQ> bandwidth > > > You can use the 24-port switches to create a full bisectional > bandwidth network

Re[2]: [Beowulf] Infiniband modular switches

2008-06-13 Thread Jan Heichler
Hallo Ramiro, Freitag, 13. Juni 2008, meintest Du: RAQ> By the way: RAQ> a) How many hops a Flextronics 10U 144 Port Modular is doing? 3 RAQ> b) And the others? 3 too. RAQ> c) How much latency am I loosing in each hop? (In the case of Voltaire RAQ> switches: ISR 9024 - 24 Ports: 140 ns ; IS

RE: [Beowulf] Roadrunner picture

2008-06-13 Thread Egan Ford
Perhaps this will help: http://www.lanl.gov/roadrunner/ And: http://www.lanl.gov/orgs/hpc/roadrunner/pdfs/Koch%20-%20Roadrunner%20Overvie w/RR%20Seminar%20-%20System%20Overview.pdf Pages 20 - 29 IANS, the triblade is really a quadblade, blade 1 is the Opteron Blade, blade 2 is a bridge, blades

Re: [Beowulf] Infiniband modular switches

2008-06-13 Thread Ramiro Alba Queipo
On Thu, 2008-06-12 at 10:08 -0500, Don Holmgren wrote: > Ramiro - > > You might want to also consider buying just a single 24-port switch for your > 22 > nodes, and then when you expand either replace with a larger switch, or build > a > distributed switch fabric with a number of leaf switches

Re: [Beowulf] Infiniband modular switches

2008-06-13 Thread Ramiro Alba Queipo
On Thu, 2008-06-12 at 10:36 -0400, Joe Landman wrote: > Ramiro Alba Queipo wrote: > > Hello everybody: > > > > We are about to build an HPC cluster with infiniband network starting > > from 22 dual socket nodes with AMD QUAD core processors and in a year or > > so we will be having about 120 nodes

Re: [Beowulf] User resource limits

2008-06-13 Thread Joe Landman
Prentice Bisbal wrote: vm.overcommit? never heard of that before. I'm going to google that now. vm tuning knobs/dials /proc/sys/vm/overcommit_memory /proc/sys/vm/overcommit_ratio or via sysctl [EMAIL PROTECTED]:~$ sysctl -a | grep -i overcommit ... vm.overcommit_memory = 0 vm.overcommit_ra

Re: [Beowulf] User resource limits

2008-06-13 Thread Prentice Bisbal
Mark Hahn wrote: >>> Unfortunately the kernel implementation of mmap() doesn't check >>> the maximum memory size (RLIMIT_RSS) or maximum data size (RLIMIT_DATA) >>> limits which were being set, but only the maximum virtual RAM size >>> (RLIMIT_AS) - this is documented in the setrlimit(2) man page.

Re: Re[2]: [Beowulf] MVAPICH2 and osu_latency

2008-06-13 Thread Ashley Pittman
On Fri, 2008-06-13 at 05:11 +0200, Jan Heichler wrote: > > > > > So you're concerned with the gap > between the 2.63 us that OSU > measured and your 3.07 us you > measured. I wouldn't be too > concerned. > > 1st: i get a value of 2.96 with MVAPICH 1.0.0 - this is exactly the > value that i fin