Thanks for the suggestion, Skylar, but I didn't see any change. I read through the nfs(5) man page, and I found the acregmin, acregmax, acdirmin, and acdirmax options. That's the attribute caching you're talking about, right? I tried editing the options in /etc/beowulf/fstab and just set actimeo=90 on /usr/local/lib, since actimeo sets all the other four settings. I restarted the beowulf service, but I didn't see any change in speed. Here's a snippet from /proc/mounts to see that the settings did take effect:
$ bpsh 5 cat /proc/mounts [...] 192.168.1.1:/usr/local/include /usr/local/include nfs rw,vers=3,rsize=32768,wsize=32768,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,addr=192.168.1.1 0 0 192.168.1.1:/usr/local/lib /usr/local/lib nfs rw,vers=3,rsize=32768,wsize=32768,acregmin=90,acregmax=90,acdirmin=90,acdirmax=90,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,addr=192.168.1.1 0 0 /usr/local/include uses the original settings, and /usr/local/lib uses actimeo=90. I couldn't find any mention of the nocto option, so perhaps it's not supported in CentOS 5.6. Don ----- Original Message ----- From: "Skylar Thompson" <skylar.thomp...@gmail.com> To: "Don Kirkby" <dkir...@cfenet.ubc.ca>, beowulf@beowulf.org Sent: Saturday, January 17, 2015 10:59:05 AM Subject: Re: [Beowulf] Python libraries slow to load across Scyld cluster On 01/16/2015 04:38 PM, Don Kirkby wrote: > Thanks for the suggestions, everyone. I've used them to find more > information, but I haven't found a solution yet. > > It looks like the time is spent opening the Python libraries, but my attempts > to change the Beowulf configuration files have not made it run any faster. > > Skylar asked: > >> Do any of your search paths (PATH, PYTHONPATH, LD_LIBRARY_PATH, etc.) >> include a remote filesystem (i.e. NFS)? This sounds a lot like you're >> blocked on metadata lookups on NFS. Using "strace -c" will give you a >> histogram of system calls by count and latency, which can be helpful in >> tracking down the problem. > > Yes, the compute nodes mount from a network file system to a local RAM disk. > When I look at mounted file systems, I can see that the Python libraries are > on a network mount. The Python libraries are at /usr/local/lib/python2.7. > > $ bpsh 5 df > Filesystem 1K-blocks Used Available Use% Mounted on > [...others deleted...] > 192.168.1.1:/usr/local/lib > 926067424 797367296 80899808 91% /usr/local/lib > > > I used strace as suggested and found that most of the time is spent in > open(). > > $ bpsh 5 strace -c python2.7 cached_imports_decimal.py > started at 2015-01-16 14:29:45.543066 > imported decimal at 0:00:21.719083 > % time seconds usecs/call calls errors syscall > ------ ----------- ----------- --------- --------- ---------------- > 97.95 0.040600 44 932 822 open > [...others deleted...] > > > I also looked at the timing of the individual system calls to see which files > were slow to open: > > bpsh 5 strace -r -o strace.txt python2.7 cached_imports_decimal.py > more strace.txt > [...] > 0.000063 open("/usr/local/lib/python2.7/lib-dynload/usercustomize.so", > O_RDONLY) = -1 ENOENT (No such file or directory) > 0.000701 open("/usr/local/lib/python2.7/lib-dynload/usercustomizemodule.so", > O_RDONLY) = -1 ENOENT (No such file or directory) > 0.127012 open("/usr/local/lib/python2.7/lib-dynload/usercustomize.py", > O_RDONLY) = -1 ENOENT (No such file or directory) > 0.126985 open("/usr/local/lib/python2.7/lib-dynload/usercustomize.pyc", > O_RDONLY) = -1 ENOENT (No such file or directory) > 0.127037 stat("/usr/local/lib/python2.7/site-packages/usercustomize", > 0x7fff28a973f0) = -1 ENOENT (No such file or directory) > 0.000086 open("/usr/local/lib/python2.7/site-packages/usercustomize.so", > O_RDONLY) = -1 ENOENT (No such file or directory) > 0.126963 > open("/usr/local/lib/python2.7/site-packages/usercustomizemodule.so", > O_RDONLY) = -1 ENOENT (No such file or directory) > [...] Do you have attribute caching (ac) setup for the NFS mount? Assuming this is a mostly read-only NFS mount point, you might also consider disabling closed-to-open cache coherence (nocto) which will significantly increase NFS performance at the expense of breaking POSIX compliance. There's a good discussion of the implications in the nfs(5) man page in the "DATA AND METADATA COHERENCE" section. Skylar _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf