On Mon, Mar 5, 2012 at 10:17 AM, Asher Langton <lang...@gmail.com> wrote: > This is a followup to my post from January > (http://mail.scipy.org/pipermail/numpy-discussion/2012-January/059801.html) > and the panel discussion at PyData this weekend. As a few people have > suggested, a better approach than the MPI-broadcasted lookups is to > cache the locations of all the modules found in sys.path. > [...] > I'll put an initial implementation of this importer on github sometime > this week, and I'll follow up this post with some performance numbers > when I have them.
Here are some numbers for the PEP302-based cached importer on an IBM BlueGene/P machine. Numbers are wallclock measurements by the time utility in minutes:seconds, one run for each test (not an average), with no attempt to take into account other activity on the system or fileservers. (With that said, I ran a variety of other tests, and the results have been consistent.) I still need to run some larger tests, particularly in the 16k-64k range, where Python imports start to scale very poorly on this machine. The tests use the code currently at github.com/langton/MPI_Import with a script that simply imports 100 small C-extension modules. With 1k cores/MPI processes: cached_import.finder: 14:19.98 - skip actual import [1]: 13:37.77 - with checks [2]: 27:09.60 - w/checks, no import: 26:23.63 cached_import.mpi4py_finder [3]: 2:32.51 - skip actual import: 1:42.55 - with checks: 2:32.38 - w/checks, no import: 1:42.94 MPI_Import [4]: 2:22.20 standard import : 15:43.63 - skip actual imports [5]: 0:56.59 With 4k cores/MPI processes: cached_import.finder: 27:34.45 - skip actual import: 27:40.58 - with checks: 52:14.83 - w/checks, no import: 50:04.73 cached_import.mpi4py_finder: 4:03.02 - skip actual import: 3:12.75 - with checks: 4:13.65 - w/checks, no import: 3:18.46 MPI_Import: 4:02.76 standard import : 35:24.77 - skip actual imports: 1:56.36 Notes: [1] Builds the cache, but omits the actual imports. [2] Check whether modules in sys.path are readable while building the cache. Because filesystem operations are expensive, these checks are off by default. [3] Only the rank 0 process builds the initial cache, which is then broadcasted over MPI. [4] The other import replacement. [5] This is roughly the interpreter startup/initialization time. -Asher _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion