I'm attaching my files this time since it looks like the formatting was messed up when I copied. Brandt
On Wed, Jun 8, 2011 at 2:20 PM, <[email protected]> wrote: > Send NumPy-Discussion mailing list submissions to > [email protected] > > To subscribe or unsubscribe via the World Wide Web, visit > http://mail.scipy.org/mailman/listinfo/numpy-discussion > or, via email, send a message with subject or body 'help' to > [email protected] > > You can reach the person managing the list at > [email protected] > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of NumPy-Discussion digest..." > > > Today's Topics: > > 1. Re: merging datetime progress (Bruce Southey) > 2. Using multiprocessing (shared memory) with numpy array > multiplication (Brandt Belson) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 08 Jun 2011 13:04:03 -0500 > From: Bruce Southey <[email protected]> > Subject: Re: [Numpy-discussion] merging datetime progress > To: [email protected] > Message-ID: <[email protected]> > Content-Type: text/plain; charset="utf-8" > > On 06/08/2011 10:26 AM, Mark Wiebe wrote: > > On Tue, Jun 7, 2011 at 11:52 PM, Fernando Perez <fperez.net > > <http://fperez.net>@gmail.com <http://gmail.com>> wrote: > > > > On Tue, Jun 7, 2011 at 4:35 PM, Mark Wiebe <[email protected] > > <mailto:[email protected]>> wrote: > > > I went ahead and did the merge today as I said I wanted to, that > > pull > > > request is some further development for someone to code-review > > if they have > > > time. > > > > I'm curious as to why there was a need to push ahead with the merge > > right away, without giving the original pull request more time for > > feedback? If I'm not mistaken, the big merge was this PR: > > > > https://github.com/numpy/numpy/pull/83 > > > > and it was just opened a few days ago, containing a massive amount of > > work, and so far had only received some feedback from charris, > > explicitly requesting a little more breakdown to make digesting it > > easier. > > > > > > This is all true, I'll try to explain what my thought process was in > > doing the merge. This set of changes basically takes the codebase from > > a basically unusable datetime to a good starting point for all the > > design discussions that we've been having on the mailing list. I > > opened the pull request with about half of the changes that were there > > to try and get some feedback, and kept developing on a datetime-fixes2 > > branch. When I reached the point that I later merged in, nobody had > > responded so I added the new commits to the same pull request. > > > > Perhaps I should have more patience with git history-editing and > > revisiting the development history, but I've basically come to the > > conclusion that it's not worth the effort except on relatively minor > > things, so I want to rather focus on the code and its design instead > > of the particular series of commits that produced it. In doing those > > commits, I had to repeatedly double back and implement new missing > > functionality before returning to and finishing up what I was working on. > > > > For the development I'm doing now, which is related to the multitude > > of design discussions, I'm splitting it up into more topical branches > > partially because the work I merged provides a solid foundation for > > doing so, and because these are things diverging from the NEP instead > > of being things I perceived as having already had a discussion process > > during the NEP formation. > > > > I tried to see if github would let me do a "dependent" pull request, > > but it just included the commits from the branch my later development > > was sitting on, and that's probably the main reason I wanted to do a > > post-commit style review for this set of changes instead of > > pre-commit. I wrote this email to try and communicate my transition > > from pre-commit to post-commit review, but I think my wording about > > this probably wasn't clear. > > > > I realize that I'm not really an active numpy contributor in any > > significant way, and I see that you've put a ton of work into this, > > including a very detailed and impressive discussion on the list on > > with multiple people. So my opinion is just that of a user, not > > really a core numpy developer. > > > > > > I think your opinion is much more than that, particularly since you're > > actively working on closely related projects using the same community > > infrastructure. > > > > But it seems to me that part of having numpy be a better > > community-driven project is precisely achieved by having the patience > > to allow others to provide feedback and testing, even if they aren't > > 100% experts. And one thing that github really shines at, is making > > the review/feedback process about as painless as possible (I actually > > find it kind of fun). > > > > > > Definitely true (except for not having dependent pull requests, unless > > my search was too shallow...). That pull request also has nothing to > > do with the discussion we're currently having, it's more of a > > prerequisite, so anyone who is following the discussion and wants to > > dig in and review the code I'm doing related to that discussion will > > be lost in a swamp. By merging this prerequisite, and introducing > > separated, cleaner pull requests on that base that are directly from > > issues being discussed, this kind of community collaboration is much > > more likely to happen. I've simply done a bad job of communicating > > this, and as I'm doing the things we're discussing I'll try and tie > > these different elements better to encourage the ideals you're > describing. > > > > For example, with this merge, numpy HEAD right now won't even compile > > on x86_64, something that would easily have been caught with a bit > > more review, especially since it's so easy to test (even I can do > > that). It's been a long time since we had a situation where numpy > > didn't cleanly at least build from HEAD, so if nothing else, it's a > > (small) sign that this particular merge could have used a few more > > eyes... > > > > > > I apologize for that, I've grown accustomed to having little to no > > review to my pull requests, except from Chuck whose time and effort > > I've greatly appreciated, and has significantly improved the > > contributions I've made. The only active C-level NumPy development > > currently appears to be what I'm doing, and the great > > clean-up/extension proposals that Chuck has emailed about. I would > > like it if the buildbot system worked better to let me automatically > > trigger some build/tests on a variety of platforms before merging a > > branch, but it is as it is. > > > > I realize that it's sometimes frustrating to have a lot of code > > sitting in review, and I know that certain efforts are large and > > self-contained enough that it's impractical to expect a detailed > > line-by-line review. We've had a few such monster branches in > ipython > > in the past, but at least in those cases we've always tried to ensure > > several core people (over skype if needed) have a chance to go over > > the entire big picture, discuss the main details with the author so > > they can know what to focus on from the large merge, and run the > tests > > in as many scenarios as is realistic. And so far we haven't really > > had any problems with this approach, even if it does require a little > > more patience in seeing that (often very high quality) work make it > to > > the mainline. > > > > > > That approach sounds great, and NumPy needs more active core developers! > > > > Cheers, > > Mark > > > > > > Regards, > > > > f > > _______________________________________________ > > NumPy-Discussion mailing list > > [email protected] <mailto:[email protected]> > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > [email protected] > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > I am sorry but github pull requests do not appear to be sent to the > numpy dev list. So you are not going to get many people to respond to > that type of 'closed' request. Further any discussion for things that > get merged into the master really should be on the list especially as > many people do extensive testing. > > Bug fixes probably do not need further notification but feature > additions or API/ABI changes should have wider notification. So an email > to the list would be greatly appreciated so that interested people can > track the request and any discussions there. Then, depending on the > nature of the request, a second email that notifies that the request > will be merged. > > I can understand Windows failures because not that many people build > under Windows but build failures under Linux are rather hard to > understand. If you do not test Python 2.4, 2.5, 2.6, 2.7, 3.1 and 3.2 > with the supported operating systems (mainly 32-bit and 64-bit Linux, > Mac and Windows) then you must let those people who can and give them > time to build and test it. That is really true when you acknowledged > that you broke one of the 'one of the datetime API functions'. > > Bruce > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110608/d3de4f1a/attachment-0001.html > > ------------------------------ > > Message: 2 > Date: Wed, 8 Jun 2011 14:20:16 -0400 > From: Brandt Belson <[email protected]> > Subject: [Numpy-discussion] Using multiprocessing (shared memory) with > numpy array multiplication > To: [email protected] > Message-ID: <[email protected]> > Content-Type: text/plain; charset="iso-8859-1" > > Hello, > I'm parallelizing some code I've written using the built in multiprocessing > module. In my application, I need to multiply many large arrays together > and > sum the resulting product arrays (inner products). I noticed that when I > parallelized this with myPool.map(...) with 8 processes (on an 8-core > machine), the code was actually about an order of magnitude slower. I > realized that this only happens when I use the numpy array multiplication. > I > defined my own array multiplication and summation. As expected, this is > much > slower than the numpy versions, but when I parallelized it I saw the > roughly > 8x speedup I expected. So something about numpy array multiplication > prevented me from speeding it up with multiprocessing. > > Is there a way to speed up numpy array multiplication with multiprocessing? > I noticed that numpy's SVD uses all of the available cores on its own, does > numpy array multiplication do something similar already? > > I'm copying the code which can reproduce and summarize these results. > Simply > run: "python shared_mem.py" with myutil.py in the same directory. > > Thanks, > Brandt > > > -- myutil.py -- > > # Utility functions > import numpy as N > def numpy_inner_product(snap1,snap2): > """ A default inner product for n-dimensional numpy arrays """ > return N.sum(snap1*snap2.conj()) > > def my_inner_product(a,b): > ip = 0 > for r in range(a.shape[0]): > for c in range(a.shape[1]): > ip += a[r,c]*b[r,c] > return ip > > def my_random(args): > return N.random.random(args) > > > def eval_func_tuple(f_args): > """Takes a tuple of a function and args, evaluates and returns result""" > return f_args[0](*f_args[1:]) > > > -- shared_mem.py -- > > > import myutil > import numpy as N > import copy > import itertools > import multiprocessing > import time as T > processes = multiprocessing.cpu_count() > pool = multiprocessing.Pool(processes=processes) > > def IPs(): > """Find inner products of all arrays with themselves""" > arraySize = (300,200) > numArrays = 50 > arrayList = pool.map(myutil.eval_func_tuple, itertools.izip( > > > itertools.repeat(myutil.my_random,numArrays),itertools.repeat(arraySize,numArrays))) > > IPs = N.zeros(numArrays) > > startTime = T.time() > for arrayIndex,arrayValue in enumerate(arrayList): > IPs[arrayIndex] = myutil.numpy_inner_product(arrayValue, arrayValue) > endTime = T.time() - startTime > print 'No shared memory, numpy array multiplication > took',endTime,'seconds' > > startTime = T.time() > innerProductList = pool.map(myutil.eval_func_tuple, > itertools.izip(itertools.repeat(myutil.numpy_inner_product), > arrayList, arrayList)) > IPs = N.array(innerProductList) > endTime = T.time() - startTime > print 'Shared memory, numpy array multiplication took',endTime,'seconds' > > startTime = T.time() > for arrayIndex,arrayValue in enumerate(arrayList): > IPs[arrayIndex] = myutil.my_inner_product(arrayValue, arrayValue) > endTime = T.time() - startTime > print 'No shared memory, my array multiplication took',endTime,'seconds' > > startTime = T.time() > innerProductList = pool.map(myutil.eval_func_tuple, > itertools.izip(itertools.repeat(myutil.my_inner_product), > arrayList, arrayList)) > IPs = N.array(innerProductList) > endTime = T.time() - startTime > print 'Shared memory, my array multiplication took',endTime,'seconds' > > > if __name__ == '__main__': > print 'Using',processes,'processes' > IPs() > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110608/19d0c6b0/attachment.html > > ------------------------------ > > _______________________________________________ > NumPy-Discussion mailing list > [email protected] > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > End of NumPy-Discussion Digest, Vol 57, Issue 36 > ************************************************ >
shared_mem.py
Description: Binary data
myutil.py
Description: Binary data
_______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
