I'm new to the list, so please let me know if I'm asking in the wrong place.
We're running a Scyld Beowulf cluster on CentOS 5.9, and I'm trying to run some
Django admin commands on a compute node. The problem is that it can take three
to five minutes to launch ten MPI processes across the four compute nodes and
the head node. (We're using OpenMPI.)
I traced the delay to smaller and smaller parts of the code until I created two
example scripts that just import a Python library and print out timing
information. Here's the first script that imports the collections module:
from datetime import datetime
t0 = datetime.now()
print 'started at {}'.format(t0)
import collections
print 'imported at {}'.format(datetime.now() - t0)
When I run that with mpirun -host n0 python cached_imports_collections.py, it
initially takes about 10 seconds to import. However, repeated runs take less
time, until it takes less than 0.01 seconds to import.
Running an equivalent script to import the decimal module takes about 30
seconds, and never speeds up like that. It may be too large to get cached
completely. I ran beostatus while the script was running, and I didn't see the
network traffic go over 100 kBps. For comparison, running wc on a 100MB file on
a compute node causes the network traffic to go over 3000 kBps.
I looked in the Scyld admin guide (PDF) and the reference guide (PDF), and
found the bplib command that manages which libraries are cached and not
transmitted with the job processes. However, its list of library directories
already includes the Python installation.
Is there some way to increase the size of the bplib cache, or am I doing
something inefficient in the way I launch my Python processes?
Thanks,
Don Kirkby
British Columbia Centre for Excellence in HIV/AIDS
cfenet.ubc.ca
_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf