Just to add a few more details to Chris' post with some references which helped us...
We were seeing severe performance issues on our diskless systems with an application doing mmap reads of large files on GPFS. The I/O pattern was sequential reads a large file. The file was 5-10 times the size of ram on the nodes. We tracked this down to 'pgscand/s' in the 'sar -B' output going outrageous (13M pages scanned per second to try to find a pages to free). Some googling led us to: <http://engineering.linkedin.com/performance/optimizing-linux-memory-management-low-latency-high-throughput-databases> Although a fairly different problem this was just the information we needed. We found that /proc/sys/vm/zone_reclaim_mode was being set to 1 on our systems despite various documentation indicating that the default value should be 0. As Chris noted the Linux kernel has recently accepted a patch claiming to set zone_reclaim_mode to 0 (although the diff does not appear to do it very directly). It looks like setting zone_reclaim_mode to 0 was proposed at least as early as 2009. I'm unclear what happened with this patch: <http://osdir.com/ml/linux-kernel/2009-05/msg05670.html> There is something from 2010 called "zone_reclaim_mode is the essence of all evil": <http://www.poempelfox.de/blog/2010/03/19/> This was very useful is pointing out Nehalem processor as being particularly susceptible and suggesting 'numactl --hardware' to check for the node distance. Distance greater than 20 being the magic number. Stuart -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf