We have the potential to have to swap whole jobs out of memory on a complete node.

that was our intent as well.  among other things, this scheme enables
running the cluster "split-personality" - mostly shorter/smaller even
interactive jobs during the day, with big/long jobs running at night.
unfortunately, you need a smart scheduler to do this, and ours is dumb.

beleive, it is 2 or more GB per core; we have 16 GB per dual-socket quad-core Opteron node). What is typical modern swap size today?

are you willing to use a node which is actually occupying 16 GB of swap?

it is possible to tune how the kernel responds to memory crunches - for instance, you can always avoid OOM with the vm.overcommit_memory=2
sysctl (you'll need to tune vm.overcommit_ratio and the amount of swap
to get the desired limits.)  in this mode, the kernel tracks how much VM
it actually needs (worst-case, reflected in Committed_AS in /proc/meminfo)
and compares that to a commit limit that reflects ram and swap.

if you don't use overcommit_memory=2, you are basically borrowing VM
space in hopes of not needing it.  that can still be reasonable, considering
how often processes have a lot of shared VM, and how many processes allocate but never touch lots of pages. but you have to ask yourself:
would I like a system that was actually _using_ 16 GB of swap?  if you
have 16x disks, perhaps, but 16G will suck if you only have 1 disk.
at least for overcommit_memory != 2, I don't see the point of configuring
a lot of swap, since the only time you'd use it is if you were thrashing.
sort of a "quality of life" argument.

But what are the reccomendations of modern praxis ?

it depends a lot on the size variance of your jobs, as well as their real/virtual ratio. the kernel only enforces RLIMIT_AS (vsz in ps),assuming a 2.6 kernel - I forget whether 2.4 did RLIMIT_RSS or not.

if you use overcommit_memory=2, your desired max VM size determines the amount of swap. otherwise, go with something modest - memory size
or so.  but given that the smallest reasonable single disk these days
is probably about 320GB, it's hard to justify being _too_ tight.
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to