Hi folks, We've got an IBM Power5 cluster running SLES9 and using the GM drivers.
We occasionally get users who manage to use up all the DMA memory that is addressable by the Myrinet card through the Power5 hypervisor. Through various firmware and driver tweaks (thanks to both IBM and Myrinet) we've gotten that limit up to almost 1GB and then we use an undocumented environment variable (GMPI_MAX_LOCKED_MBYTE) to say only use 248MB of that per process (as we've got 4 cores in each box), which we enforce through Torque. The problems went away. Or at least it did until just now. :-( The characterstic error we get is: [13]: alloc_failed, not enough memory (Fatal Error) Context: <(gmpi_init) gmpi_dma_alloc: dma_recv buffers> Now Myrinet can handle running out of DMA memory once a process is running, but when it starts it must be able to allocate a (fairly trivial) amount of DMA memory otherwise you get that fatal error. Looking at the node I can confirm that there are only 3 user processes running, so what I am after is a way of determining how much of that DMA memory a process has allocated. I looked at /proc/${PID}/maps and saw this: 40028000-40029000 r--s 00002000 00:0c \ 8483 /dev/gm0 which to me looks like a memory mapping, but to my eyes that looks like just 1,000 bytes.. Does anyone have any ideas at all ? Oh - switching to the Myrinet MX drivers (which doesn't have this problem) is not an option, we have an awful lot of users, mostly (non-computer) scientists, who have their own codes and trying to persuade them to recompile would be very hard - which would be necessary as we've not been able to convince MPICH-GM to build shared libraries on Linux on Power with the IBM compilers. :-( cheers, Chris -- Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia
pgpMSf2PbwzJ7.pgp
Description: PGP signature
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf