Re: [Numpy-discussion] creating/working NumPy-ndarrays in C++
This makes me ask something that I always wanted to know: why is weave not the preferred or encouraged way ? Is it because no developer has interest in maintaining it or is it too onerous to maintain ? I do not know enough of its internals to guess an answer. I think it would be fair to say that weave has languished a bit over the years. What I like about weave is that even when I drop into the C++ mode I can pretty much use the same numpy'ish syntax and with no overhead of calling back into the numpy c functions. From the sourceforge forum it seems the new Blitz++ is quite competitive with intel fortran in SIMD vectorization as well, which does sound attractive. Would be delighted if development on weave catches up again. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] creating/working NumPy-ndarrays in C++
>> I think the story is that Cython overlaps enough with Weave that Weave >> doesn't get any new users or developers. > > One big issue that I had with weave is that it compile on the fly. As a > result, it makes for very non-distributable software (requires a compiler > and the development headers installed), and leads to problems in the long > run. > > Gael I do not know much Cython, except for the fact that it is out there and what it is supposed to do., but wouldnt Cython need a compiler too ? I imagine distributing Cython based code would incur similar amounts of schlep. But yes, you raise a valid point. It does cause annoyances. One that I have faced is with running the same code simultaneously over a mix of 32 bit and 64 bit machines. But this is because the source code hashing function does not take the architecture into account. Shouldnt be hard to fix. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] creating/working NumPy-ndarrays in C++
>> I do not know much Cython, except for the fact that it is out there >> and what it is supposed to do., but wouldnt Cython need a compiler too >> ? > > Yes, but at build-time, not run time. Ah! I see what you mean, or so I think. So the first time a weave based code runs, it builds, stores the code on disk and then executes. Whereas in Cython there is a clear separation of build vs execute. In fairness, though, it shouldnt be difficult to pre-empt a build with weave. But I imagine Cython has other advantages (and in my mind so does weave in certain restricted areas) Now I feel it will be great to marry the two, so that for the most part Cython does not need to call into the numpy api for array based operations but fall back on something weave like. May be sometime in future >> I imagine distributing Cython based code would incur similar amounts >> of schlep. > > if you distribute source, yes, but if you at least have the option of > distributing binaries. (and distutils does make that fairly easy, for > some value of fairly) Indeed. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What is consensus anyway
On Wed, Apr 25, 2012 at 11:08 PM, Puneeth Chaganti wrote: > On Thu, Apr 26, 2012 at 6:41 AM, Travis Oliphant wrote: > [snip] >> >> It would be nice if every pull request created a message to this list. Is >> that even possible? > > That is definitely possible and shouldn't be too hard to do, like > Jason said. But that can potentially cause some confusion, with some > of the discussion starting off in the mailing list, and some of the > discussion happening on the pull-request itself. Are my concerns > justified? Related issue: some projects have an user's list and a devel list. It might be worth (re?)considering that option. They have their pros and cons but I think I like the idea of a devel list and seperate "help wanted" list. Something else that might be helpful for contentious threads is a stack-overflowesque system where readers can vote up responses of others. Sometimes just a "i agree" "i disagree" goes a long way, especially when you have many lurkers. On something else that was brought up: I do not consider myself competent/prepared enough to take on development, but it is not the case that I have _never_ felt the temptation. What I have found intimidating and styming is the perceived politics over development issues. The two places where I have felt this are a) on contentious threads on the list and b) what seems like legitimate patches tickets on trac that seem to be languishing for no compelling technical reason. I would be hardpressed to quote specifics, but I have encountered this feeling a few times. For my case it would not have mattered, because I doubt I would have contriuted anything useful. However, it might be the case that more competent lurkers might have felt the same way. The possibility of a patch relegated semipermanently to trac, or the possibility of getting caught up in the politics is bit of a disincentive. This is just an honest perception/observation. I am more of a get on with it, get the code out and rest will resolve itself eventually kind of a guy, thus long political/philosophical/epistemic threads distance me. I know there are legitimate reasons to have this discussions. But it seems to me that they get a bit too wordy here sometimes. My 10E-2. -- srean ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What is consensus anyway
> Patches languishing on Trac is a real problem. The issue here is not at all > about not wanting those patches, Oh yes I am sure of that, in the past it had not been clear what more is necessary to get them pulled in, or how to go about satisfying the requirements. The document you mailed on the scipy list goes a long way in addressing those issues. So thanks a lot. In fact it might be a good idea to add the link to it in the signature of the mail that trac replies with. but just about the overhead of getting them > reviewed/fixed/committed. This problem has more or less disappeared with > Github; there are very few PRs that are just sitting there. > > As for existing patches on Trac, if you or anyone else has an interest in > one of them, checking that patch for test coverage / documentation and > resubmitting it as a PR would be a massive help. > > Ralf ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] fast access and normalizing of ndarray slices
Hi Wolfgang, I think you are looking for reduceat( ), in particular add.reduceat() -- srean On Thu, May 31, 2012 at 12:36 AM, Wolfgang Kerzendorf wrote: > Dear all, > > I have an ndarray which consists of many arrays stacked behind each other > (only conceptually, in truth it's a normal 1d float64 array). > I have a second array which tells me the start of the individual data sets in > the 1d float64 array and another one which tells me the length. > Example: > > data_array = (conceptually) [[1,2], [1,2,3,4], [1,2,3]] = in reality > [1,2,1,2,3,4,1,2,3, dtype=float64] > start_pointer = [0, 2, 6] > length_data = [2, 4, 3] > > I now want to normalize each of the individual data sets. I wrote a simple > for loop over the start_pointer and length data grabbed the data and > normalized it and wrote it back to the big array. That's slow. Is there an > elegant numpy way to do that? Do I have to go the cython way? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] automatic differentiation with PyAutoDiff
> > For example, I wrote a library routine for doing log-linear > regression. Doing this required computing the derivative of the > likelihood function, which was a huge nitpicky hassle; took me a few > hours to work out and debug. But it's still just 10 lines of Python > code that I needed to figure out once and they're done forever, now. > I'd have been perfectly happy if I could have gotten those ten lines > by asking a random unreleased library I pulled off github, which > depended on heavy libraries like Theano and relied on a mostly > untested emulator for some particular version of the CPython VM. But > I'd be less happy to ask everyone who uses my code to install that > library as well, just so I could avoid having to spend a few hours > doing math. This isn't a criticism or your library or anything, it's > just that I'm always going to be reluctant to rely on an automatic > differentiation tool that takes arbitrary code as input, because it > almost certainly cannot be made fully robust. So it'd be nice to have > the option to stick a human in the loop. Log-linears are by definition too simple a model to appreciate auto-differentiation. Try computing the Hessian by hand on a modestly sized multilayer neural network and you will start seeing the advantages. Or say computing the Hessian of a large graphical model. But I do have my own reservations about auto-diff. Until we have the smart enough compiler that does common subexpression elimination, and in fact even then, hand written differentiation code will often turn out to be more efficient. Terms cancel out (subtraction or division), terms factorize, terms can be arranged into an efficient Horner's scheme. It will take a very smart symbolic manipulation of the parse tree to get all that. So places where I really need to optimize the derivative code, I would still do it by hand and delegate it to an AD system when the size gets unwieldy. In theory a good compromise is to let the AD churn out the code and then hand optimize it. But here readable output indeed does help. As far as correctness of the computed derivative is concerned, computing the dot product between the gradient of a function and the secant computed numerically from the function does guard against gross errors. If i remember correctly the scipy module on optimization already has a function to do such sanity checks. Of course it cannot guarantee correctness, but usually goes a long way. -- srean ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] automatic differentiation with PyAutoDiff
> > You're right - there is definitely a difference between a correct > gradient and a gradient is both correct and fast to compute. > > The current quick implementation of pyautodiff is naive in this > regard. Oh and by no means was I criticizing your implementation. It is a very hard problem to solve and as you indicate takes several man years to deal with. And compared to having no gradient at all, a gradient but possibly slower to compute is a big improvement :) > True, even approximating a gradient by finite differences is a subtle > thing if you want to get the most precision per time spent. Another > thing I was wondering about was periodically re-running the original > bytecode on inputs to make sure that the derived bytecode produces the > same answer (!). Those two sanity checks would detect the two most > scary errors to my mind as a user: > a) that autodiff got the original function wrong > b) that autodiff is mis-computing a gradient. Was suggesting finite difference just for sanity check, not as an actual substitute for the gradient. You wont believe how many times the finite difference check has saved me from going in the exact opposite direction ! ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] automatic differentiation with PyAutoDiff
> Of course, maybe you were pointing out that if your derivative > calculation depends in some intrinsic way on the topology of some > graph, then your best bet is to have an automatic way to recompute it > from scratch for each new graph you see. In that case, fair enough! That is indeed what I had in mind. In neural networks, Markov random fields, Bayesian networks, graph regularization etc it is something that has to be dealt with all the time. > Right, and what I want is to do those correctness checks once, and > then save the validated derivative function somewhere and know that it > won't break the next time I upgrade some library or make some > seemingly-irrelevant change to the original code. Exactly. What I was getting at is: even if it is not feasible to get a pretty printed python output, the byte code can still be validated (somewhat) with a few numeric sanity checks. So, yes the derivatives needn't/shouldn't be re-computed in runtime all the time and an API that that returns even some opaque but computable representation of the derivative that can be validated and then "frozen" would be helpful. I think one can go further and formally prove the correctness of the derivative computing engine. I dont know if anyone has done it. Maybe Theano does it. Should be possible for a statically typed sublanguage. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] automatic differentiation with PyAutoDiff
> Hi, > > I second James here, Theano do many of those optimizations. Only > advanced coder can do better then Theano in most case, but that will > take them much more time. If you find some optimization that you do > and Theano don't, tell us. We want to add them :) > > Fred I am sure Theano does an excellent job of expressions that matter. But I think to get the best symbolic reduction of an expression is a hard, as in, an AI hard problem. Correct me if I am wrong though. One can come up with perverse corner cases using algebraic or trigonometric identities, expressions that are hundreds of terms long but whose derivatives are simple, perhaps even a constant. But all that matters is how well it does for the common cases and am hearing that it does extremely well. I will be happy if it can reduce simple things like the following (a very common form in Theano's domain) \phi(x) - \phi(y) - dot( x-y, \grad_phi(y)) evaluated for \phi(x) = \sum_i (x_i log x_i) - x_i to \sum_i x_i log(x_i / y_i) on the set sum(x) = sum(y) = 1 In anycase I think this is a digression and rather not pollute this thread with peripheral (nonethless very interesting) issues. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Semantics of index arrays and a request to fix the user guide
>From the user guide: - > Boolean arrays must be of the same shape as the array being indexed, > or broadcastable to the same shape. In the most straightforward case, > the boolean array has the same shape. Comment: So far so good, but the doc has not told me yet what is the shape or the output. -- user guide continues with an example: -- > The result is a 1-D array containing all the elements in the indexed array > corresponding to all the true elements in the boolean array. Comment: -- Now it is not clear from that line whether the shape of the result is generally true or is it specific to the example. So the reader (me) is still confused. User Guide continues: > With broadcasting, multidimensional arrays may be the result. For example... Comment: -- I will get to the example in a minute, but there is no explanation of the mechanism used to arrive at the output shape, is it the shape of what the index array was broadcasted to ? or is it something else, if it is the latter, what is it. Example The example indexes a (5,7) array with a (5,) index array. Now this very confusing because it seems to contradict the original documentation because (5,) is neither the same shape as (5,7) nor is it broadcastable to it. The steps of the conventional broaddcasting would yield (5,7) (5,) then (5,7) (1,5) then an error because 7 and 5 dont match. User guide continues: -- > Combining index arrays with slices. > In effect, the slice is converted to an index array > np.array([[1,2]]) (shape (1,2)) that is broadcast with > the index array to produce a resultant array of shape (3,2). comment: - Here the two arrays have shape (3,) and (1,2) so how does broadcasting yield the shape 3,2. Broadcasting is supposed to proceed trailing dimension first but it seems in these examples it is doing the opposite. = So could someone explain the semantics and make the user guide more precise. Assuming the user guide will be the first document the new user will read it is surprisingly difficult to read, primarily because it gets into advanced topics to soon and partially because of ambiguous language. The numpy reference on the other hand is very clear as is Travis's book which I am glad to say I actually bought a long time ago. Thanks, srean ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Semantics of index arrays and a request to fix the user guide
Hi All, my question might have got lost due to the intense activity around the 1.7 release. Now that it has quietened down, would appreciate any help regarding my confusion about how index arrays work (especially when broadcasted). -- srean On Mon, Jun 25, 2012 at 5:29 PM, srean wrote: > From the user guide: > - > >> Boolean arrays must be of the same shape as the array being indexed, >> or broadcastable to the same shape. In the most straightforward case, >> the boolean array has the same shape. > > Comment: So far so good, but the doc has not told me yet what is the > shape or the output. > -- > > user guide continues with an example: > -- > >> The result is a 1-D array containing all the elements in the indexed array >> corresponding to all the true elements in the boolean array. > > > Comment: > -- > > Now it is not clear from that line whether the shape of the result is > generally true or is it specific to the example. So the reader(me) is > still confused. > >There is no explanation about > the mechanism used to arrive at the output shape, is it the shape of > what the index array was broadcasted to ? or is it something else, if > it is the latter, what is it. > > Example > > > The example indexes a (5,7) array with a (5,) index array. Now this > is confusing because it seems to contradict the original > documentation because > (5,) is neither the same shape as (5,7) nor is it broadcastable to it. > > The steps of the conventional broaddcasting would yield > > (5,7) > (5,) > > then > > (5,7) > (1,5) > > then an error because 7 and 5 dont match. > > > > User guide continues: > -- > >> Combining index arrays with slices. > >> In effect, the slice is converted to an index array >> np.array([[1,2]]) (shape (1,2)) that is broadcast with >> the index array to produce a resultant array of shape (3,2). > > comment: > - > > Here the two arrays have shape > (3,) and (1,2) so how does broadcasting yield the shape 3,2. > Broadcasting is supposed to proceed trailing dimension first but it > seems in these examples it is doing the opposite. > > = > > So could someone explain the semantics and make the user guide more precise. > > Assuming the user guide will be the first document the new user will > read it is surprisingly difficult to read, primarily because it gets > into advanced topics to soon and partially because of ambiguous > language. The numpy reference on the other hand is very clear as is > Travis's book which I am glad to say I actually bought a long time > ago. > > Thanks, > srean ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] memory allocation at assignment
> Yes it does. If you want to avoid this extra copy, and have a > pre-existing output array, you can do: > > np.add(a, b, out=c) > > ('+' on numpy array's is just a synonym for np.add; np.add is a ufunc, > and all ufunc's accept this syntax: > http://docs.scipy.org/doc/numpy/reference/ufuncs.html > ) Is the creation of the tmp as expensive as creation of a new numpy array or is it somewhat lighter weight (like being just a data buffer). I sometimes use the c[:] syntax thinking I might benefit from numpy.array re-use. But now I think that was misguided. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Meta: help, devel and stackoverflow
Hi List, this has been brought up several times, and the response has been generally positive but it has fallen through the cracks. So here are a few repeat requests. Am keeping it terse just for brevity i) Split the list into [devel] and [help] and as was mentioned recently [rant/flame]: some request for help get drowned out during active development related discussions and simple help requests pollutes more urgent development related matters. ii) Stackoverflow like site for help as well as for proposals. The silent majority has been referred to a few times recently. I suspect there does exist many lurkers on the list who do prefer one discussed solution over the other but for various reasons do not break out of their lurk mode to send a mail saying "I prefer this solution". Such an interface will also help in keeping track of the level of support as compared to mails that are larges hunks of quoted text with a line or two stating ones preference or seconding a proposal. One thing I have learned from traffic accidents is that if one asks for a help of the assembled crowd, no one knows how to respond. On the other hand if you say "hey there in a blue shirt could you get some water" you get instant results. So pardon me for taking the presumptuous liberty to request Travis to please set it up or delegate. Splitting the lists shouldn't be hard work, setting up overflow might be more work in comparison. Best -- srean ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Meta: help, devel and stackoverflow
If I remember correctly there used to be a stackexchange site at ask.scipy.org. It might be good to learn from that experience. I think handling with spam was a significant problem, but am not sure whether that is the reson why it got discontinued. Best srean On Thu, Jun 28, 2012 at 11:36 AM, Cera, Tim wrote: > > A little more research shows that we could have a > http://numpy.stackexchange.com. The requirements are just to have people > involved. See http://area51.stackexchange.com/faq for more info. > > Kindest regards, > Tim ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Meta: help, devel and stackoverflow
In case this changes your mind (or assuages fears) just wanted to point out that many open source projects do this. It is not about claiming that one is more important than the other, nor does it reinforce the idea that developers and users live in separate silos, but more of directing the mails to different folders. No policing is required as well, just reply to the author and to the appropriate list. Right now reading numpy-discussion@scipy.org feels a lot like drinking from a fire hydrant when a couple of threads become very active. This is just anecdotal evidence, but I have had mails unanswered when there is one or two threads that are dominating the list. People are human and there will be situations where the top responders will be overburdened and I think the split will mitigate the problem somewhat. For whatever reasons, answering help requests are handled largely by a small set of star responders, though I suspect the answer is available more widely even among comparitively new users. I am hoping (a) that with a separate "ask for help" such enlightened new users can take up the slack (b) the information gets better organized (c) we do not impose on users who are not so interested in devel issues and vice versa. I take interest in devel related issues (apart from the distracting and what at times seem petty flamewars) and like reading the numpy source, but dont think every user have similar tastes neither should they. Best Srean On Thu, Jun 28, 2012 at 2:42 PM, Matthew Brett wrote: > Hi, > > On Thu, Jun 28, 2012 at 7:42 AM, Olivier Delalleau wrote: >> +1 for a numpy-users list without "dev noise". > > Moderately strong vote against splitting the mailing lists into devel and > user. > > As we know, this list can be unhappy and distracting, but I don't > think splitting the lists is the right approach to that problem. > > Splitting the lists sends the wrong signal. I'd rather that we show > by example that the developers listen to all voices, and that the > users should expect to become developers. In other words that the > boundary between the user and developer is fluid and has no explicit > boundaries. > > As data points, I make no distinction between scipy-devel and > scipy-user, nor cython-devel and cython-user. Policing the > distinction ('please post this on the user mailing list') is a boring > job and doesn't make anyone more cheerful. > > I don't believe help questions are getting lost any more than devel > questions are, but I'm happy to be corrected if someone has some data. > > Cheers, > > Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Meta: help, devel and stackoverflow
> And I continue to think it sends the wrong message. Maybe if you articulate your fears I will be able to appreciate your point of view more. > My impression is that, at the moment, we numpy-ers are trying to work > out what kind of community we are. Are we a developer community, or > are we some developers who are users of a library that we rely on, but > do not contribute to? I think it is fair to extrapolate that all of us would want the numpy community to grow. If that be so at some point not all of the users will be developers. Apart from ones own pet projects, all successful projects have more users than active developers. What I like about having two lists is that on one hand it does not prevent me or you from participating in both, on the other hand it allows those who dont want to delve too deeply in one aspect or the other, the option of a cleaner inbox, or the option of having separate inboxes. I for instance would like to be in both the lists, perhaps mostly as a lurker, but still would want to have two different folders just for better organization. To me this seems a win win. There is also a chance that more lurkers would speak up on the help list than here and I think that is a good thing. Best srean ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Meta: help, devel and stackoverflow
Could not have said this better even if I tried, so thank you for your long answer. -- srean On Thu, Jun 28, 2012 at 4:57 PM, Fernando Perez wrote: > Long answer, I know... ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Meta: help, devel and stackoverflow
> I'm not on the python mailing lists, but my impression is that python > is in a different space from numpy. I mean, I have the impression Indeed one could seek out philosphical differences between different projects. No two projects are the same but they can and often do have common issues. About the issues that Fernando mentioned I can say that they are real, they do apply and this I say from a from the experience of being on the numpy mailing list. I think that many silent numpy users will thank the creation of a low barrier, low noise (noise is context sensitive) forum where they can ask for help with what they feel are simple questions with easy answers. I still do not have a tangible grasp of what your fears are. It seems you are unhappy that this will split the community. It wont, its just two lists for the same community where mails have been sorted into different folders. It also seems the notion of developers and users is disagreeable to you and you are philosophically hesitant about accepting/recognizing that such a difference exists. I may be wrong, I do not intend to speak for you, I am only trying to understand your objections. First let me assure you they are labels on (temporary) roles not on a person (if that is what is making you uncomfortable). Different people occupy different states for different amounts of time. A question about how to run length decode an array of integers is very different from a question on which files to touch to add reduceat( ) support to the numexpression engine and how. It would be strange to take the position that there is no difference between the nature of these questions. Or to take the position that the person who is interest in the former is also keen to learn about the former (note: some would be, example: yours sincerely. I know the former ot the latter ) or at the least keen on receiving mails on extended discussion on the topic of lesser interest. It seems to me, that sorting these mails into different bins only improves the contextual signal to noise ratio, which the recipient can use as he/she feels fit. The only issue is if there will be enough volume for each of these bins. My perception is yes but this can certainly be revisited. In anycase it does not prevent nor hinder any activity, but allows flexible organization of content should one want it. > So, it may not make sense to think in terms of a model that works for Python, > or even, IPython. I do not want to read too much into this, but this I do find kind of odd and confusing: to proactively solicit input from other related projects but then say that do do not apply once the views expressed werent in total agreement. This thread is coming close to veer into the non-technical/non-productive/argumentative zone. The type that I am fearful off, so I will stop here. But I would encourage you to churn these views in your mind, impersonally, to see if the idea of different lists have any merit and to seek out what are the tangible harm that can come out of it. I think this request has come before (hasten to add not initiated by me) and the response had been largely been in favor, but nothing has happened. So I would welcome information on: if indeed two lists are to be made, who gets to create those lists Best, srean ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Meta: help, devel and stackoverflow
I like this solution and I think ask.scipy.org can be revived to take over that role, but this will need some policing to send standard questions there and also some hangout time at ask.scipy.org. I love the stackoverflow model but it requires more active participation of those who want to answer questions as compared to mailing lists. This because questions not only do not come to you by default but they also get knocked off the top page as more questions come in. Something to watch out for though I believe it wont be as bad as the main SO site. Meta^2 I have been top posting with abandon here. Not sure what is preferred here, top or bottom. Best srean On Thu, Jun 28, 2012 at 8:52 PM, T J wrote: > On Thu, Jun 28, 2012 at 3:23 PM, Fernando Perez > wrote: > I'm okay with having two lists as it does filtering for me, but this seems > like a sub-optimal solution. > > Observation: Some people would like to apply labels to incoming messages. > Reality: Email was not really designed for that. > > We can hack it by using two different email addresses, but why not just keep > this list as is and make a concentrated effort to promote the use of 2.0 > technologies, like stackoverflow/askbot/etc? There, people can put as many > tags as desired on questions: matrix, C-API, iteration, etc. Potentially, > these tags would streamline everyone's workflow. The stackoverflow setup > also makes it easier for users to search for solutions to common questions, > and know that the top answer is still an accurate answer. [No one likes > finding old invalid solutions.] The reputation system and up/down votes > also help new users figure out which responses to trust. > > As others have explained, it does seem that there are distinct types of > discussions that take place on this list. > > 1) There are community discussiuons/debates. > > Examples are the NA discussion, the bug tracker, release schedule, ABI/API > changes, matrix rank tolerance too low, lazy evaluation, etc. These are > clearly mailing-list topics. If you look at all the messages for the last > two(!) months, it seems like this type of message has been the dominate > type. > > 2) There are also standard questions. > > Recent examples are "memory allocation at assignment", "dot() function > question", "not expected output of fill_diagonal", "silly isscalar > question". These messages seem much more suited to the stackoverflow > environment. In fact, I'd be happy if we redirected such questions to > stackoverflow. This has the added benefit that responses to such questions > will stay on topic. Note that if a stackoverflow question seeds a > discussion, then someone can start a new thread on the mailing list which > cite the stackoverflow question. > > tl;dr > > Keep this list the same, and push "user" questions to stackoverflow instead > of pushing them to a user list. > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Meta: help, devel and stackoverflow
On Sat, Jun 30, 2012 at 2:29 PM, John Hunter wrote: > This thread is a perfect example of why another list is needed. +1 On Sat, Jun 30, 2012 at 2:37 PM, Matthew Brett wrote: > Oh - dear. I think the point that most of us agreed on was that > having a different from: address wasn't a perfect solution for giving > people space for asking newbie type questions. No-one has to read an > email. If it looks boring or silly or irrelevant to your concerns, > well, then ignore it. Looking at the same mails, it doesn't seem to me that most of us have agreed on that. It seems most have us have expressed that they will be satisfied with two different lists but are open about considering the stackoverflow model. The latter will require more work and time to get it going copmpared to the former. Aside: A logical conclusion of your "dont read mails that dont interest you" would be that spam is not a problem, after all no one has to read spam. If it looks boring or silly or irrelevant to your concerns, well, then ignore it. On Sat, Jun 30, 2012 at 1:57 PM, Dag Sverre Seljebotn wrote: > http://news.ycombinator.com/item?id=4131462 It seems it was mostly driven an argumentative troll, who had decided beforehand to disagree with some of the other folks and went about cooking up interpretations so that he/she can complain about them. Sadly, this list shows such tendencies at times as well. Anecdotal data-point: I have been happy with SO in general. It works for certain types of queries very well. OTOH if the answer to the question is known only to a few and he/she does not happen to be online at time the question was posted, and he/she does not "pull" such possible questions by key-words, that question is all but history. The difference is that on a mailing list questions are "pushed" on to people who might be able to answer it, whereas in SO model people have to actively seek questions they want to answer. Unanticipated, niche questions tend to disappear. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Meta: help, devel and stackoverflow
> Isn't that what the various sections are for? Indeed they are, but it still needs active "pulling" on behalf of those who would want to answer questions and even then a question can sink deep in the well. Deeper than what one typically monitors. Sometimes question are not appropriately tagged. Sometimes it is not obvious what the tag should be, or which tag is being monitored by the persons who might have the answer. Could be less of a problem for us given that its a more focused group and the predefined tags are not split too fine. I think the main issue is that SO requires more active engagement than a mailing list because checking for new mail has become something that almost everyone does by default anyway. Not saying SO is bad, I have benefited greatly from it, but this issues should be kept in mind. > http://stackoverflow.com/questions?sort=newest > http://stackoverflow.com/questions?sort=unanswered > And then, if you want modification-by-modification updates: > http://stackoverflow.com/questions?sort=active > Entries are sorted by date and you can view as many pages worth as are > available. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Meta: help, devel and stackoverflow
> You can subscribe to be notified by email whenever a question is posted > to a certain tag. Absolutely true. > So then it is no different than a mailing list as far > as push/pull. There are a few differences though. New tags get created often, potentially in a decentralized fashion and dynamically, way more often than creation of lists. Thats why the need to actively monitor. Another is in frequency of subscription, how often does a user of SO subscribe to a tag. Yet another is that tags are usually are much more specific than a typical charter of a mailing list and thats a good thing because it makes things easier to find nd browse. I think if the tags are kept broad enough (or it is ensured that finer tags inherit from broader tags. For example numpy.foo where foo can be created according to the existing SO rules of tag creation ) and participants here are willing to subscribe to those tags, there wont be much of a difference. So, just two qualifiers. In addition if there is a way to bounce-n-answer user questions posted here to the SO forum relatively painlessy that will be quite nice too. May be something that creates a new user based on user's mail id, mails him/her the response and a password with which he/she can take control of the id. It is more polite and may be a good way for the SO site to collect more users. Best --srean ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Array views
Hi, I am also interested in this. In my application there is a large 2d array, lets call it 'b' to keep the notation consistent in the thread. b's columns need to be recomputed often. Ideally this re-computation happens in a function. Lets call that function updater(b, col_index): The simplest example is where updater(b, col_index) is a matrix vector multiply, where the matrix or the vector changes. Is there anyway apart from using ufuncs that I can make updater() write the result directly in b and not create a new temporary column that is then copied into b ? Say for the matrix vector multiply example. I can write the matrix vector product in terms of ufuncs but will lose out in terms of speed. In the best case scenario I would like to maintain 'b' in a csr sparse matrix form, as 'b' participates in a matrix vector multiply. I think csr would be asking for too much, but even ccs should help. I dont want to clutter this thread with the sparsity issues though, any solution to the original question or pointers to solutions would be appreciated. Thanks --srean On Sat, Mar 26, 2011 at 12:10 PM, Hugo Gagnon < sourceforge.nu...@user.fastmail.fm> wrote: > Hello, > > Say I have a few 1d arrays and one 2d array which columns I want to be > the 1d arrays. > I also want all the a's arrays to share the *same data* with the b > array. > If I call my 1d arrays a1, a2, etc. and my 2d array b, then > > b[:,0] = a1[:] > b[:,1] = a2[:] > ... > > won't work because apparently copying occurs. > I tried it the other way around i.e. > > a1 = b[:,0] > a2 = b[:,1] > ... > > and it works but that doesn't help me for my problem. > Is there a way to reformulate the first code snippet above but with > shallow copying? > > Thanks, > -- > Hugo Gagnon > -- > Hugo Gagnon > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Array views
Hi Christopher, thanks for taking the time to reply at length. I do understand the concept of striding in general but was not familiar with the Numpy way of accessing that information. So thanks for pointing me to .flag and .stride. That said, BLAS/LAPACK do have apis that take the stride length into account. But for sparse arrays I think its a hopeless situation. That is a bummer, because sparse is what I need. Oh well, I will probably do it in C++ -- srean p.s. I hope top posting is not frowned upon here. If so, I will keep that in mind in my future posts. On Sat, Mar 26, 2011 at 1:31 PM, Christopher Barker wrote: > > Probably not -- the trick is that when an array is a view of a slice of > another array, it may not be laid out in memory in a way that other libs > (like LAPACK, BLAS, etc) require, so the data needs to be copied to call > those routines. > > To understand all this, you'll need to study up a bit on how numpy > arrays lay out and access the memory that they use: they use a concept > of "strided" memory. It's very powerful and flexible, but most other > numeric libs can't use those same data structures. I"m not sure what a > good doc is to read to learn about this -- I learned it from messing > with the C API. TAke a look at any docs that talk about "strides", and > maybe playing with the "stride tricks" tools will help. > > A simple example: > > In [3]: a = np.ones((3,4)) > > In [4]: a > Out[4]: > array([[ 1., 1., 1., 1.], >[ 1., 1., 1., 1.], >[ 1., 1., 1., 1.]]) > > In [5]: a.flags > Out[5]: > C_CONTIGUOUS : True > F_CONTIGUOUS : False > OWNDATA : True > WRITEABLE : True > ALIGNED : True > UPDATEIFCOPY : False > > So a is a (3,4) array, stored in C_contiguous fashion, jsut like a > "regular old C array". A lib expecting data in this fashion could use > the data pointer just like regular C code. > > In [6]: a.strides > Out[6]: (32, 8) > > this means is is 32 bytes from the start of one row to the next, and 8 > bytes from the start of one element to the next -- which makes sense for > a 64bit double. > > > In [7]: b = a[:,1] > > In [10]: b > Out[10]: array([ 1., 1., 1.]) > > so b is a 1-d array with three elements. > > In [8]: b.flags > Out[8]: > C_CONTIGUOUS : False > F_CONTIGUOUS : False > OWNDATA : False > WRITEABLE : True > ALIGNED : True > UPDATEIFCOPY : False > > but it is NOT C_Contiguous - the data is laid out differently that a > standard C array. > > In [9]: b.strides > Out[9]: (32,) > > so this means that it is 32 bytes from one element to the next -- for a > 8 byte data type. This is because the elements are each one element in a > row of the a array -- they are not all next to each other. A regular C > library generally won't be able to work with data laid out like this. > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Array views
Ah! very nice. I did not know that numpy-1.6.1 supports in place 'dot', and neither the fact that you could access the underlying BLAS functions like so. This is pretty neat. Thanks. Now I at least have an idea how the sparse version might work. If I get time I will probably give numpy-1.6.1 a shot. I already have the MKL libraries thanks to free version of epd for students. On Sat, Mar 26, 2011 at 2:34 PM, Pauli Virtanen wrote: > > Like so: > ># Fortran-order for efficient DGEMM -- each column must be contiguous >A = np.random.randn(4,4).copy('F') >b = np.random.randn(4,10).copy('F') > >def updater(b, col_idx): > # This will work in Numpy 1.6.1 >dot(A, b[:,col_idx].copy(), out=b[:,col_idx]) > > In the meantime you can do > >A = np.random.randn(4,4).copy('F') >b = np.random.randn(4,10).copy('F') > >from scipy.lib.blas import get_blas_funcs >gemm, = get_blas_funcs(['gemm'], [A, b]) # get correct type func > >def updater(b, col_idx): > bcol = b[:,col_idx] >c = gemm(1.0, A, bcol.copy(), 0.0, bcol, overwrite_c=True) >assert c is bcol # check that it didn't make copies! > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Array views
On Sat, Mar 26, 2011 at 3:16 PM, srean wrote: > > Ah! very nice. I did not know that numpy-1.6.1 supports in place 'dot', > In place is perhaps not the right word, I meant "in a specified location" ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Shared memory ndarrays (update)
Hi everyone, I was looking up the options that are available for shared memory arrays and this thread came up at the right time. The doc says that multiprocessing .Array(...) gives a shared memory array. But from the code it seems to me that it is actually using a mmap. Is that correct a correct assessment, and if so, is there any advantage in using multiprocessing.Array(...) over simple numpy mmaped arrays. Regards srean ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Shared memory ndarrays (update)
Apologies for adding to my own post. multiprocessing.Array(...) uses an anonymous mmapped file. I am not sure if that means it is resident on RAM or the swap device. But my original question remains, what are the pros and cons of using it versus numpy mmapped arrays. If multiprocessing.Array is indeed resident in memory (subject to swapping of course) that would still be advatageous compared to a file mapped from a on-disk filesystem. On Mon, Apr 11, 2011 at 12:42 PM, srean wrote: > Hi everyone, > > I was looking up the options that are available for shared memory arrays > and this thread came up at the right time. The doc says thatmultiprocessing > .Array(...) gives a shared memory array. But from the code it seems to me > that it is actually using a mmap. Is that correct a correct assessment, and > if so, is there any advantage in using multiprocessing.Array(...) over > simple numpy mmaped arrays. > > Regards > srean > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Shared memory ndarrays (update)
Got you and thanks a lot for the explanation. I am not using Queues so I think I am safe for the time being. Given that you have worked a lot on these issues, would you recommend plain mmapped numpy arrays over multiprocessing.Array Thanks again -- srean On Mon, Apr 11, 2011 at 1:36 PM, Sturla Molden wrote: > "Shared memory" is memory mapping from the paging file (i.e. RAM), not a > file on disk. They can have a name or be anonymous. I have explained why we > need named shared memory before. If you didn't understand it, try to pass an > instance of multiprocessing.Array over multiprocessing.Queue. > > Sturla > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] ufunc 's order of execution [relevant when output overlaps with input]
Hi, is there a guarantee that ufuncs will execute left to right and in sequential order ? For instance is the following code standards compliant ? >>> import numpy as n >>> a=n.arange(0,5) array([0, 1, 2, 3, 4]) >>> n.add(a[0:-1], a[1:], a[0:-1]) array([1, 3, 5, 7]) The idea was to reuse and hence save space. The place where I write to is not accessed again. I am quite surprised that the following works correctly. >>>n.add.accumulate(a,out=a) I guess it uses a buffer and rebinds `a` to that buffer at the end. It is also faster than >>> n.add(a[0:-1], a[1:], a[1:]) --sean ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ufunc 's order of execution [relevant when output overlaps with input]
> It is possible that we can make an exception for inputs and outputs > that overlap each other and pick a standard traversal. In those cases, > the order of traversal can affect the semantics, Exactly. If there is no overlap then it does not matter and can potentially be done in parallel. On the other hand if there is some standardized traversal that might allow one to write nested loops compactly. I dont really need it, but found the possibility quite intriguing. It always reads from a[i] before it writes to out[i], so it's always > consistent. > Ah I see, thanks. Should have seen through it. --sean ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Adding the arrays in an array iterator
Hi List, I have to sum up an unknown number of ndarrays of the same size. These arrays, possibly thousands in number, are provided by an iterator. Right now I use python reduce with operator.add. Does that invoke the corresponding ufunc internally, I want to avoid creating temporaries. With ufunc I know how to do it, but not sure how to make use of that in reduce. It is not essential that I use reduce though, so I would welcome idiomatic and efficient way of executing this. So far I have stayed away from building an ndarray object and summing across the relevant dimension. Is that what I should be doing ? Different invocations of this function has different number of arrays, so I cannot pre-compile away this into a numexpr. Thanks and regards srean ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] [Repost] Adding the arrays returned in an array iterator
Bumping my question tentatively. I am fairly sure there is a good answer and for some reason it got overlooked. Regards srean -- Forwarded message -- From: srean Date: Fri, May 27, 2011 at 10:36 AM Subject: Adding the arrays in an array iterator To: Discussion of Numerical Python Hi List, I have to sum up an unknown number of ndarrays of the same size. These arrays, possibly thousands in number, are provided via an iterator. Right now I use python reduce with operator.add. Does that invoke the corresponding ufunc internally ? I want to avoid creating temporaries, which I suspect a naive invocation of reduce will create. With ufunc I know how to avoid making copies using the output parameter, but not sure how to make use of that in reduce. It is not essential that I use reduce though, so I would welcome idiomatic and efficient way of executing this. So far I have stayed away from building an ndarray object and summing across the relevant dimension. Is that what I should be doing ? Different invocations of this function has different number of arrays, so I cannot pre-compile away this into a numexpr. Thanks and regards srean ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [Repost] Adding the arrays returned in an array iterator
> If they are in a list, then I would do something like Apologies if it wasnt clear in my previous mail. The arrays are in a lazy iterator, they are non-contiguous and there are several thousands of them. I was hoping there was a way to get at a "+=" operator for arrays to use in a reduce. Seems like indeed there is. I had missed operator.iadd() > result = arrays[0].copy() > for a in arrays[1:]: > result += a > > But much depends on the details of your problem. > Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication
Looking at the code the arrays that you are multiplying seem fairly small (300, 200) and you have 50 of them. So it might the case that there is not enough computational work to compensate for the cost of forking new processes and communicating the results. Have you tried larger arrays and more of them ? If you are on an intel machine and you have MKL libraries around I would strongly recommend that you use the matrix multiplication routine if possible. MKL will do the parallelization for you. Well, any good BLAS implementation would do the same, you dont really need MKL. ATLAS and ACML would work too, just that MKL has been setup for us and it works well. To give an idea, given the amount of tuning and optimization that these libraries have undergone a numpy.sum would be slower that an multiplication with a vector of all ones. So in the interest of speed the longer you stay in the BLAS context the better. --srean On Fri, Jun 10, 2011 at 10:01 AM, Brandt Belson wrote: > Unfortunately I can't flatten the arrays. I'm writing a library where the > user supplies an inner product function for two generic objects, and almost > always the inner product function does large array multiplications at some > point. The library doesn't get to know about the underlying arrays. > Thanks, > Brandt ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] (cumsum, broadcast) in (numexpr, weave)
Hi All, is there a fast way to do cumsum with numexpr ? I could not find it, but the functions available in numexpr does not seem to be exhaustively documented, so it is possible that I missed it. Do not know if 'sum' takes special arguments that can be used. To try another track, does numexpr operators have something like the 'out' parameter for ufuncs ? If it is so, one could perhaps use add( a[0:-1], a[1,:], out = a[1,:) provided it is possible to preserve the sequential semantics. Another option is to use weave which does have cumsum. However my code requires expressions which implement broadcast. That leads to my next question, does repeat or concat return a copy or a view. If they avoid copying, I could perhaps use repeat to simulate efficient broadcasting. Or will it make a copy of that array anyway ?. I would ideally like to use numexpr because I make heavy use of transcendental functions and was hoping to exploit the VML library. Thanks for the help -- srean ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] (cumsum, broadcast) in (numexpr, weave)
Apologies, intended to send this to the scipy list. On Tue, Jun 21, 2011 at 2:35 PM, srean wrote: > Hi All, > > is there a fast way to do cumsum with numexpr ? I could not find it, > but the functions available in numexpr does not seem to be > exhaustively documented, so it is possible that I missed it. Do not > know if 'sum' takes special arguments that can be used. > > To try another track, does numexpr operators have something like the > 'out' parameter for ufuncs ? If it is so, one could perhaps use > add( a[0:-1], a[1,:], out = a[1,:) provided it is possible to preserve > the sequential semantics. > > Another option is to use weave which does have cumsum. However my code > requires expressions which implement broadcast. That leads to my next > question, does repeat or concat return a copy or a view. If they avoid > copying, I could perhaps use repeat to simulate efficient > broadcasting. Or will it make a copy of that array anyway ?. I would > ideally like to use numexpr because I make heavy use of transcendental > functions and was hoping to exploit the VML library. > > Thanks for the help > > -- srean > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] How to avoid extra copying when forming an array from an iterator
Hi, I have a iterator that yields a complex object. I want to make an array out of a numerical attribute that the yielded object possesses and that too very efficiently. My initial plan was to keep writing the numbers to a StringIO object and when done generate the numpy array using StringIO's buffer. But fromstring() method fails on the StringIO object, so does fromfile() or using the StringIO object as the initializer of an array object. After some digging I ran in to this ticket htt projects.scipy.org/numpy/ticket/1634 that has been assigned a low priority. Is there some other way to achieve what I am trying ? Efficiency is important because potentially millions of objects would be yielded. -- srean ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How to avoid extra copying when forming an array from an iterator
To answer my own question, I guess I can keep appending to a array.array() object and get a numpy.array from its buffer if possible. Is that the efficient way. On Fri, Jun 24, 2011 at 2:35 AM, srean wrote: > Hi, > > I have an iterator that yields a complex object. I want to make an array > out of a numerical attribute that the yielded object possesses and that too > very efficiently. > > My initial plan was to keep writing the numbers to a StringIO object and > when done generate the numpy array using StringIO's buffer. But fromstring() > method fails on the StringIO object, so does fromfile() or using the > StringIO object as the initializer of an array object. > > After some digging I ran in to this ticket htt > projects.scipy.org/numpy/ticket/1634 that has been assigned a low > priority. > > Is there some other way to achieve what I am trying ? Efficiency is > important because potentially millions of objects would be yielded. > > -- srean > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How to avoid extra copying when forming an array from an iterator
On Fri, Jun 24, 2011 at 9:12 AM, Robert Kern wrote: > On Fri, Jun 24, 2011 at 04:03, srean wrote: > > To answer my own question, I guess I can keep appending to a > array.array() > > object and get a numpy.array from its buffer if possible. Is that the > > efficient way. > > It's one of the most efficient ways to do it, yes, especially for 1D > arrays. > Thanks for the reply. My first cut was to try cStringIO because I thought I could use the writelines() method and hence avoid the for loop in the python code. It would have been nice if that had worked. If I understood it correctly the looping would have been in a C function call. regards srean ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How to avoid extra copying when forming an array from an iterator
A valiant exercise in hope: Is this possible to do it without a loop or extra copying. What I have is an iterator that yields a fixed with string on every call to next(). Now I want to create a numpy array of ints out of the last 4 chars of that string. My plan was to pass the iterator through a generator that returned an iterator over the last 4 chars. (sub question: given that strings are immutable, is it possible to yield a view of the last 4 chars rather than a copy). Then apply StringIO.writelines() on the 2-char iterator returned. After its done, create a numpy.array from the StringIO's buffer. This does not work, the other option is to use an array.array in place of a StringIO object. But is it possible to fill an array.array using a lazy iterator without an explicit loop in python. Something like the writelines() call I know premature optimization and all that, but this indeed needs to be done efficiently Thanks again for your gracious help -- srean On Fri, Jun 24, 2011 at 9:12 AM, Robert Kern wrote: > On Fri, Jun 24, 2011 at 04:03, srean wrote: > > To answer my own question, I guess I can keep appending to a > array.array() > > object and get a numpy.array from its buffer if possible. Is that the > > efficient way. > > It's one of the most efficient ways to do it, yes, especially for 1D > arrays. > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Array vectorization in numpy
>> I think this is essential to speed up numpy. Maybe numexpr could handle this >> in the future? Right now the general use of numexpr is result = >> numexpr.evaluate("whatever"), so the same problem seems to be there. >> >> With this I am not saying that numpy is not worth it, just that for many >> applications (specially with huge matrices/arrays), pre-allocation does make >> a huge difference, especially if we want to attract more people to using >> numpy. > > The ufuncs and many scipy functions take a "out" parameter where you > can specify a pre-allocated array. It can be a little awkward writing > expressions that way, but the capability is there. This is a slight digression: is there a way to have a out parameter like semantics with numexpr. I have always used it as a[:] = numexpr(expression) But I dont think numexpr builds the value in place. Is it possible to have side-effects with numexpr as opposed to obtaining values, for example "a= a * b + c" The documentation is not clear about this. Oh and I do not find the "out" parameter awkward at all. Its very handy. Furthermore, if I may, here is a request that the Blitz++ source be updated. Seems like there is a lot of activity on the Blitz++ repository and weave is very handy too and can be used as easily as numexpr. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Array vectorization in numpy
>> This is a slight digression: is there a way to have a out parameter >> like semantics with numexpr. I have always used it as >> >> a[:] = numexpr(expression) > In order to make sure the 1.6 nditer supports multithreading, I adapted > numexpr to use it. The branch which does this is here: > http://code.google.com/p/numexpr/source/browse/#svn%2Fbranches%2Fnewiter > This supports out, order, and casting parameters, visible here: > http://code.google.com/p/numexpr/source/browse/branches/newiter/numexpr/necompiler.py#615 > It's pretty much ready to go, just needs someone to do the release > management. > -Mark Oh excellent, I did not know that the out parameter was available. Hope this gets in soon. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] c-info.ufunc-tutorial.rst
Hi, I was reading this document, https://github.com/numpy/numpy/blob/master/doc/source/user/c-info.ufunc-tutorial.rst its well written and there is a good build up to exciting code examples that are coming, but I do not see the actual examples, only how they may be used. Is it located somewhere else and not linked? or is it that the c-info.ufunc-tutorial.rst document is incomplete and the examples have not been written. I suspect the former. In that case could anyone point to the code examples and may be also update the c-info.ufunc-tutorial.rst document. Thanks -- srean ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] c-info.ufunc-tutorial.rst
Following up on my own question: I can see the code in the commit. So it appears that code-block:: Are not being rendered correctly. Could anyone confirm ? In case it is my browser alone, though I did try after disabling no-script. On Wed, Aug 24, 2011 at 6:53 PM, srean wrote: > Hi, > > I was reading this document, > https://github.com/numpy/numpy/blob/master/doc/source/user/c-info.ufunc-tutorial.rst > > its well written and there is a good build up to exciting code examples > that are coming, but I do not see the actual examples, only how they may be > used. Is it located somewhere else and not linked? or is it that the > c-info.ufunc-tutorial.rst document is incomplete and the examples have not > been written. I suspect the former. In that case could anyone point to the > code examples and may be also update the c-info.ufunc-tutorial.rst document. > > Thanks > > -- srean > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] c-info.ufunc-tutorial.rst
Thanks Anthony and Mark, this is good to know. So what would be the advised way of looking at freshly baked documentation ? Just look at the raw files ? or is there some place else where the correct sphinx rendered docs are hosted. On Wed, Aug 24, 2011 at 7:19 PM, Anthony Scopatz wrote: > code-block:: is a directive that I think might be specific to sphinx. > Naturally, github's renderer will drop it. > > On Wed, Aug 24, 2011 at 7:10 PM, Mark Wiebe wrote: > >> >> I believe this is because of github's .rst processor which simply drops >> blocks it can't understand. When building NumPy documentation, many more >> extensions and context exists. I'm getting the same thing in the C-API >> NA-mask documentation I just posted. >> >> -Mark >> > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] the build and installation process
Hi, I would like to know a bit about how the installation process works. Could you point me to a resource. In particular I want to know how the site.cfg configuration works. Is it numpy/scipy specific or is it standard with distutils. I googled for site.cfg and distutils but did not find any authoritative document. I believe many new users trip up on the installation process, especially in trying to substitute their favourite library in place os the standard. So a canonical document explaining the process will be very helpful. http://docs.scipy.org/doc/numpy/user/install.html does cover some of the important points but its a bit sketchy, and has a "this is all that you need to know" flavor. Doesnt quite enable the reader to fix his own problems. So a resource that is somewhere in between reading up all the sources that get invoked during the installation and building, and the current install document will be very welcome. English is not my native language, but if there is anyway I can help, I would do so gladly. -- srean ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: Numexpr 2.0 released
This is great news, I hope this gets included in the epd distribution soon. I had mailed a few questions about numexpr sometime ago. I am still curious about those. I have included the relevant parts below. In addition, I have another question. There was a numexpr branch that allows a "out=blah" parameer to build the output in place, has that been merged or its functionality incorporated ? This goes without saying, but, thanks for numexpr. -- from old mail -- What I find somewhat encumbering is that there is no single piece of document that lists all the operators and functions that numexpr can parse. For a new user this will be very useful There is a list in the wiki page entitled "overview" but it seems incomplete (for instance it does not describe the reduction operations available). I do not know enough to know how incomplete it is. Is there any plan to implement the reduction like enhancements that ufuncs provide: namely reduce_at, accumulate, reduce ? It is entirely possible that they are already in there but I could not figure out how to use them. If they aren't it would be great to have them. On Sun, Nov 27, 2011 at 7:00 AM, Francesc Alted wrote: > > > Announcing Numexpr 2.0 > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering
As one lurker to another, thanks for calling it out. Over-argumentative, and personality centric threads like these have actually led me to distance myself from the numpy community. I do not know how common it is now because I do not follow it closely anymore. It used to be quite common at one point in time. I came down to check after a while, and lo there it is again. If a mail is put forward as a question "i find this confusing, is it confusing for you", it ought not to devolve into a shouting match atop moral high-horses "so you think I am stupid do you? too smart are you ? how dare you express that it doesnt bother you as much when it bothers me and my documented case of 4 people. I have four, how many do you have" If something is posed as a question one should be open to the answers. Sometimes it is better not to pose it a question at all but offer alternatives and ask for preference. I am not siding with any of the technical options provided, just requesting that the discourse not devolve into these personality oriented contests. It gets too loud and noisy. Thank you On Sat, Apr 6, 2013 at 12:18 PM, matti picus wrote: > as a lurker, may I say that this discussion seems to have become > non-productive? > > It seems all agree that docs needs improvement, perhaps a first step would > be to suggest doc improvements, and then the need for renaming may become > self-evident, or not. > > aww darn, ruined my lurker status. > Matti Picus > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] repeat an array without allocation
Hi all, is there an efficient way to do the following without allocating A where A = np.repeat(x, [4, 2, 1, 3], axis=0) c = A.dot(b)# b.shape thanks -- srean ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] repeat an array without allocation
Great ! thanks. I should have seen that. Is there any way array multiplication (as opposed to matrix multiplication) can be sped up without forming A and (A * b) explicitly. A = np.repeat(x, [4, 2, 1, 3], axis = 0)# A.shape == 10,10 c = sum(b * A, axis = 1)# b.shape == 10,10 In my actual setting b is pretty big, so I would like to avoid creating another array the same size. I would also like to avoid a Python loop. st = 0 for (i,rep) in enumerate([4, 2, 1, 3]): end = st + rep c[st : end] = np.dot(b[st : end, :], a[i,:]) st = end Is Cython the only way ? On Mon, May 5, 2014 at 1:20 AM, Jaime Fernández del Río < jaime.f...@gmail.com> wrote: > On Sun, May 4, 2014 at 9:34 PM, srean wrote: > >> Hi all, >> >> is there an efficient way to do the following without allocating A where >> >> A = np.repeat(x, [4, 2, 1, 3], axis=0) >> c = A.dot(b)# b.shape >> > > If x is a 2D array you can call repeat **after** dot, not before, which > will save you some memory and a few operations: > > >>> a = np.random.rand(4, 5) > >>> b = np.random.rand(5, 6) > >>> np.allclose(np.repeat(a, [4, 2, 1, 3], axis=0).dot(b), > ... np.repeat(a.dot(b), [4, 2, 1, 3], axis=0)) > True > > Similarly, if x is a 1D array, you can sum the corresponding items of b > before calling dot: > > >>> a = np.random.rand(4) > >>> b = np.random.rand(10) > >>> idx = np.concatenate(([0], np.cumsum([4,2,1,3])[:-1])) > >>> np.allclose(np.dot(np.repeat(a, [4,2,1,3] ,axis=0), b), > ... np.dot(a, np.add.reduceat(b, idx))) > ... ) > True > > Jaime > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes > de dominación mundial. > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Shared memory check on in-place modification.
Wait, when assignments and slicing mix wasn't the behavior supposed to be equivalent to copying the RHS to a temporary and then assigning using the temporary. Is that a false memory ? Or has the behavior changed ? As long as the behavior is well defined and succinct it should be ok On Tuesday, July 28, 2015, Sebastian Berg wrote: > > On Mon Jul 27 22:51:52 2015 GMT+0200, Sturla Molden wrote: > > On 27/07/15 22:10, Anton Akhmerov wrote: > > > Hi everyone, > > > > > > I have encountered an initially rather confusing problem in a piece of > > > code that attempted to symmetrize a matrix: `h += h.T` > > > The problem of course appears due to `h.T` being a view of `h`, and > > > some elements being overwritten during the __iadd__ call. > > > > I think the typical proposal is to raise a warning. Note there is > np.may_share_memoty. But the logic to give the warning is possibly not > quite easy, since this is ok to use sometimes. If someone figures it out > (mostly) I would be very happy zo see such warnings. > > > > Here is another example > > > > >>> a = np.ones(10) > > >>> a[1:] += a[:-1] > > >>> a > > array([ 1., 2., 3., 2., 3., 2., 3., 2., 3., 2.]) > > > > I am not sure I totally dislike this behavior. If it could be made > > constent it could be used to vectorize recursive algorithms. In the case > > above I would prefer the output to be: > > > > array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.]) > > > > It does not happen because we do not enforce that the result of one > > operation is stored before the next two operands are read. The only way > > to speed up recursive equations today is to use compiled code. > > > > > > Sturla > > > > > > ___ > > NumPy-Discussion mailing list > > NumPy-Discussion@scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Shared memory check on in-place modification.
I got_misled_by (extrapolated erroneously from) this description of temporaries in the documentation http://docs.scipy.org/doc/numpy/user/basics.indexing.html#assigning-values-to-indexed-arrays ,,,])]" ... new array is extracted from the original (as a temporary) containing the values at 1, 1, 3, 1, then the value 1 is added to the temporary, and then the temporary is assigned back to the original array. Thus the value of the array at x[1]+1 is assigned to x[1] three times, rather than being incremented 3 times." It is talking about a slightly different scenario of course, the temporary corresponds to the LHS. Anyhow, as long as the behavior is defined rigorously it should not be a problem. Now, I vaguely remember abusing ufuncs and aliasing in interactive sessions for some weird cumsum like operations (I plead bashfully guilty). On Fri, Aug 7, 2015 at 1:38 PM, Sebastian Berg wrote: > On Fr, 2015-08-07 at 13:14 +0530, srean wrote: > > Wait, when assignments and slicing mix wasn't the behavior supposed to > > be equivalent to copying the RHS to a temporary and then assigning > > using the temporary. Is that a false memory ? Or has the behavior > > changed ? As long as the behavior is well defined and succinct it > > should be ok > > > > No, NumPy has never done that as far as I know. And since SIMD > instructions etc. make this even less predictable (you used to be able > to abuse in-place logic, even if usually the same can be done with > ufunc.accumulate so it was a bad idea anyway), you have to avoid it. > > Pauli is working currently on implementing the logic needed to find if > such a copy is necessary [1] which is very cool indeed. So I think it is > likely we will such copy logic in NumPy 1.11. > > - Sebastian > > > [1] See https://github.com/numpy/numpy/pull/6166 it is not an easy > problem. > > > > On Tuesday, July 28, 2015, Sebastian Berg > > wrote: > > > > > > > > On Mon Jul 27 22:51:52 2015 GMT+0200, Sturla Molden wrote: > > > On 27/07/15 22:10, Anton Akhmerov wrote: > > > > Hi everyone, > > > > > > > > I have encountered an initially rather confusing problem > > in a piece of > > > > code that attempted to symmetrize a matrix: `h += h.T` > > > > The problem of course appears due to `h.T` being a view of > > `h`, and > > > > some elements being overwritten during the __iadd__ call. > > > > > > > I think the typical proposal is to raise a warning. Note there > > is np.may_share_memoty. But the logic to give the warning is > > possibly not quite easy, since this is ok to use sometimes. If > > someone figures it out (mostly) I would be very happy zo see > > such warnings. > > > > > > > Here is another example > > > > > > >>> a = np.ones(10) > > > >>> a[1:] += a[:-1] > > > >>> a > > > array([ 1., 2., 3., 2., 3., 2., 3., 2., 3., 2.]) > > > > > > I am not sure I totally dislike this behavior. If it could > > be made > > > constent it could be used to vectorize recursive algorithms. > > In the case > > > above I would prefer the output to be: > > > > > > array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.]) > > > > > > It does not happen because we do not enforce that the result > > of one > > > operation is stored before the next two operands are read. > > The only way > > > to speed up recursive equations today is to use compiled > > code. > > > > > > > > > Sturla > > > > > > > > > ___ > > > NumPy-Discussion mailing list > > > NumPy-Discussion@scipy.org > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > ___ > > NumPy-Discussion mailing list > > NumPy-Discussion@scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ > > NumPy-Discussion mailing list > > NumPy-Discussion@scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] automatically avoiding temporary arrays
Thanks Francesc, Robert for giving me a broader picture of where this fits in. I believe numexpr does not handle slicing, so that might be another thing to look at. On Wed, Oct 5, 2016 at 4:26 PM, Robert McLeod wrote: > > As Francesc said, Numexpr is going to get most of its power through > grouping a series of operations so it can send blocks to the CPU cache and > run the entire series of operations on the cache before returning the block > to system memory. If it was just used to back-end NumPy, it would only > gain from the multi-threading portion inside each function call. > Is that so ? I thought numexpr also cuts down on number of temporary buffers that get filled (in other words copy operations) if the same expression was written as series of operations. My understanding can be wrong, and would appreciate correction. The 'out' parameter in ufuncs can eliminate extra temporaries but its not composable. Right now I have to manually carry along the array where the in place operations take place. I think the goal here is to eliminate that. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] automatically avoiding temporary arrays
On Wed, Oct 5, 2016 at 5:36 PM, Robert McLeod wrote: > > It's certainly true that numexpr doesn't create a lot of OP_COPY > operations, rather it's optimized to minimize them, so probably it's fewer > ops than naive successive calls to numpy within python, but I'm unsure if > there's any difference in operation count between a hand-optimized numpy > with out= set and numexpr. Numexpr just does it for you. > That was my understanding as well. If it automatically does what one could achieve by carrying the state along in the 'out' parameter, that's as good as it can get in terms removing unnecessary ops. There are other speedup opportunities of course, but that's a separate matter. > This blog post from Tim Hochberg is useful for understanding the > performance advantages of blocking versus multithreading: > > http://www.bitsofbits.com/2014/09/21/numpy-micro-optimization-and-numexpr/ > Hadnt come across that one before. Great link. Thanks. using caches and vector registers well trumps threading, unless one has a lot of data and it helps to disable hyper-threading. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion