On Tue, Sep 08, 2015 at 10:12:37AM -0400, Gary Robinson wrote: > There was a huge data structure that all the analysis needed to > access. Using a database would have slowed things down too much. > Ideally, I needed to access this same structure from many cores at > once. On a Power8 system, for example, with its larger number of > cores, performance may well have been good enough for production. In > any case, my experimentation and prototyping would have gone more > quickly with more cores. > > But this data structure was simply too big. Replicating it in > different processes used memory far too quickly and was the limiting > factor on the number of cores I could use. (I could fork with the big > data structure already in memory, but copy-on-write issues due to > reference counting caused multiple copies to exist anyway.)
This problem is *exactly* the type of thing that PyParallel excels at, just FYI. PyParallel can load large, complex data structures now, and then access them freely from within multiple threads. I'd recommended taking a look at the "instantaneous Wikipedia search server" example as a start: https://github.com/pyparallel/pyparallel/blob/branches/3.3-px/examples/wiki/wiki.py That loads trie with 27 million entries, creates ~27.1 million PyObjects, loads a huge NumPy array, and has a WSS of ~11GB. I've actually got a new version in development that loads 6 tries of the most frequent terms for character lengths 1-6. Once everything is loaded, the data structures can be accessed for free in parallel threads. There are more details regarding how this is achieved on the landing page: https://github.com/pyparallel/pyparallel I've done a couple of consultancy projects now that were very data science oriented (with huge data sets), so I really gained an appreciation for how common the situation you describe is. It is probably the best demonstration of PyParallel's strengths. > Gary Robinson gary...@me.com http://www.garyrobinson.net Trent. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com