This is a followup to my post from January (http://mail.scipy.org/pipermail/numpy-discussion/2012-January/059801.html) and the panel discussion at PyData this weekend. As a few people have suggested, a better approach than the MPI-broadcasted lookups is to cache the locations of all the modules found in sys.path.
I previously claimed the the PEP 302 finders/loaders wouldn't work here because the finder is selected by a module's path and filename, at which point the damage is already done. At the PyData panel, Guido countered that PEP 302 does indeed provide the necessary machinery for implementing the 'right' solution. The trick is to use sys.meta_path. (Thanks to Travis for pointing me in the direction of sys.meta_path, and Dag for helping me work through the details.) Here's an example demonstrating the use of sys.meta_path: import os # Simple finder/loader that pretends to load module 'foo' class foo(object): def find_module(self,fullname,path=None): if fullname == "bar": return self return None def load_module(self,fullname): if fullname == "bar": return os raise ImportError("This shouldn't happen!") if __name__ == "__main__": import sys sys.meta_path.append(foo()) import bar # actually the os module print bar.getcwd() To eliminate the import bottleneck, the finder/loader just needs to traverse sys.path, make a dict mapping modules to their location in the filesystem, and 'claim responsibility' for those modules in find_module(). Building (and maintaining, when sys.path changes) this dict, even if each process does it independently, shouldn't be much worse than the traversal required by a single import statement. We could even subclass the finder/loader so that the dict construction is done by only one process and the result broadcasted over MPI, though that probably isn't necessary. I'll put an initial implementation of this importer on github sometime this week, and I'll follow up this post with some performance numbers when I have them. -Asher _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion