Re: [Numpy-discussion] creating/working NumPy-ndarrays in C++

2012-04-03 Thread srean
This makes me ask something that I always wanted to know: why is weave
not the preferred or encouraged way ?

Is it because no developer has interest in maintaining it or is it too
onerous to maintain ? I do not know enough of its internals to guess
an answer. I think it would be fair to say that weave has languished a
bit over the years.

What I like about weave is that even when I drop into the C++ mode I
can pretty much use the same numpy'ish syntax and with no overhead of
calling back into the numpy c functions. From the sourceforge forum it
seems the new Blitz++ is quite competitive with intel fortran in SIMD
vectorization as well, which does sound attractive.

Would be delighted if development  on weave catches up again.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] creating/working NumPy-ndarrays in C++

2012-04-04 Thread srean
>> I think the story is that Cython overlaps enough with Weave that Weave
>> doesn't get any new users or developers.
>
> One big issue that I had with weave is that it compile on the fly. As a
> result, it makes for very non-distributable software (requires a compiler
> and the development headers installed), and leads to problems in the long
> run.
>
> Gael

I do not know much Cython, except for the fact that it is out there
and what it is supposed to do., but wouldnt Cython need a compiler too
? I imagine distributing Cython based code would incur similar amounts
of schlep.

But yes, you raise a valid point.  It does cause annoyances. One that
I have faced is with running the same code simultaneously over a mix
of 32 bit and 64 bit machines. But this is  because the source code
hashing function does not take the architecture into account. Shouldnt
be hard to fix.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] creating/working NumPy-ndarrays in C++

2012-04-04 Thread srean
>> I do not know much Cython, except for the fact that it is out there
>> and what it is supposed to do., but wouldnt Cython need a compiler too
>> ?
>
> Yes, but at build-time, not run time.

Ah! I see what you mean, or so I think. So the first time a weave
based code runs, it builds, stores the code on disk and then executes.
Whereas in Cython there is a clear separation of build vs execute. In
fairness, though, it shouldnt be difficult to pre-empt a build with
weave. But I imagine Cython has other advantages (and in my mind so
does weave in certain restricted areas)

Now I feel it will be great to marry the two, so that for the most
part Cython does not need to call into the numpy api for array based
operations but fall back on something weave like. May be sometime in
future 

>> I imagine distributing Cython based code would incur similar amounts
>> of schlep.
>
> if you distribute source, yes, but if you at least have the option of
> distributing binaries. (and distutils does make that fairly easy, for
> some value of fairly)

Indeed.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] What is consensus anyway

2012-04-25 Thread srean
On Wed, Apr 25, 2012 at 11:08 PM, Puneeth Chaganti  wrote:
> On Thu, Apr 26, 2012 at 6:41 AM, Travis Oliphant  wrote:
> [snip]
>>
>> It would be nice if every pull request created a message to this list.    Is 
>> that even possible?
>
> That is definitely possible and shouldn't be too hard to do, like
> Jason said.  But that can potentially cause some confusion, with some
> of the discussion starting off in the mailing list, and some of the
> discussion happening on the pull-request itself.  Are my concerns
> justified?

Related issue: some projects have an user's list and a devel list. It
might be worth (re?)considering that option. They have their pros and
cons but I think I like the idea of a devel list and seperate "help
wanted" list.

Something else that might be  helpful for contentious threads is a
stack-overflowesque system where readers can vote up responses of
others. Sometimes just a "i agree" "i disagree" goes a long way,
especially when you have many lurkers.

On something else that was brought up: I do not consider myself
competent/prepared enough to take on development, but it is not the
case that I have _never_ felt the temptation. What I have found
intimidating and styming is the perceived politics over development
issues.  The two places where I have felt this are a) on contentious
threads on the list and b) what seems like legitimate patches tickets
on trac that seem to be languishing for no compelling technical
reason. I would be hardpressed to quote specifics, but I have
encountered this feeling a few times.

 For my case it would not have mattered, because I doubt I would have
contriuted anything useful. However, it might be the case that more
competent lurkers might have felt the same way. The possibility of a
patch relegated semipermanently to trac, or the possibility of getting
caught up in the politics is bit of a disincentive. This is just an
honest perception/observation.

I am more of a get on with it, get the code out and rest will resolve
itself eventually kind of a guy, thus long
political/philosophical/epistemic threads distance me. I know there
are legitimate reasons to have this discussions. But it seems to me
that they get a bit too wordy here sometimes.

My 10E-2.

-- srean
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] What is consensus anyway

2012-04-26 Thread srean
> Patches languishing on Trac is a real problem. The issue here is not at all
> about not wanting those patches,

Oh yes I am sure of that, in the past it had not been clear what more
is necessary to get them pulled in, or how to go about satisfying the
requirements. The document you mailed on the scipy list goes a long
way in addressing those issues. So thanks a lot. In fact it might be a
good idea to add the link to it in the signature of the mail that trac
replies with.

 but just about the overhead of getting them
> reviewed/fixed/committed. This problem has more or less disappeared with
> Github; there are very few PRs that are just sitting there.
>
> As for existing patches on Trac, if you or anyone else has an interest in
> one of them, checking that patch for test coverage / documentation and
> resubmitting it as a PR would be a massive help.
>
> Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fast access and normalizing of ndarray slices

2012-06-03 Thread srean
Hi Wolfgang,

  I think you are looking for reduceat( ), in particular add.reduceat()

-- srean

On Thu, May 31, 2012 at 12:36 AM, Wolfgang Kerzendorf
 wrote:
> Dear all,
>
> I have an ndarray which consists of many arrays stacked behind each other 
> (only conceptually, in truth it's a normal 1d float64 array).
> I have a second array which tells me the start of the individual data sets in 
> the 1d float64 array and another one which tells me the length.
> Example:
>
> data_array = (conceptually) [[1,2], [1,2,3,4], [1,2,3]] = in reality 
> [1,2,1,2,3,4,1,2,3, dtype=float64]
> start_pointer = [0, 2, 6]
> length_data = [2, 4, 3]
>
> I now want to normalize each of the individual data sets. I wrote a simple 
> for loop over the start_pointer and length data grabbed the data and 
> normalized it and wrote it back to the big array. That's slow. Is there an 
> elegant numpy way to do that? Do I have to go the cython way?
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] automatic differentiation with PyAutoDiff

2012-06-14 Thread srean
>
> For example, I wrote a library routine for doing log-linear
> regression. Doing this required computing the derivative of the
> likelihood function, which was a huge nitpicky hassle; took me a few
> hours to work out and debug. But it's still just 10 lines of Python
> code that I needed to figure out once and they're done forever, now.
> I'd have been perfectly happy if I could have gotten those ten lines
> by asking a random unreleased library I pulled off github, which
> depended on heavy libraries like Theano and relied on a mostly
> untested emulator for some particular version of the CPython VM. But
> I'd be less happy to ask everyone who uses my code to install that
> library as well, just so I could avoid having to spend a few hours
> doing math. This isn't a criticism or your library or anything, it's
> just that I'm always going to be reluctant to rely on an automatic
> differentiation tool that takes arbitrary code as input, because it
> almost certainly cannot be made fully robust. So it'd be nice to have
> the option to stick a human in the loop.

Log-linears are by definition too simple a model to appreciate
auto-differentiation. Try computing the Hessian by hand  on a modestly
sized multilayer neural network and you will start seeing the
advantages. Or say computing the Hessian of a large graphical model.
But I do have my own reservations about auto-diff. Until we have the
smart enough compiler that does common subexpression elimination, and
in fact even then, hand written differentiation code will often turn
out to be more efficient. Terms cancel out (subtraction or division),
terms factorize, terms can be arranged into an efficient Horner's
scheme. It will take a very smart symbolic manipulation of the parse
tree to get all that.

 So places where I really need to optimize the derivative code, I
would still do it by hand and delegate it to an AD system when the
size gets unwieldy. In theory a good compromise is to let the AD churn
out the code and then hand optimize it. But here readable output
indeed does help.

As far as correctness of the computed derivative is concerned,
computing the dot product between the gradient of a function and the
secant computed numerically from the function does guard against gross
errors. If i remember correctly the scipy module on optimization
already has a function to do such sanity checks. Of course it cannot
guarantee correctness, but usually goes a long way.

-- srean
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] automatic differentiation with PyAutoDiff

2012-06-14 Thread srean
>
> You're right - there is definitely a difference between a correct
> gradient and a gradient is both correct and fast to compute.
>
> The current quick implementation of pyautodiff is naive in this
> regard.


Oh and by no means was I criticizing your implementation. It is a very
hard problem to solve and as you indicate takes several man years to
deal with. And compared to having no gradient at all, a gradient but
possibly slower to compute is a big improvement :)


> True, even approximating a gradient by finite differences is a subtle
> thing if you want to get the most precision per time spent. Another
> thing I was wondering about was periodically re-running the original
> bytecode on inputs to make sure that the derived bytecode produces the
> same answer (!). Those two sanity checks would detect the two most
> scary errors to my mind as a user:
> a) that autodiff got the original function wrong
> b) that autodiff is mis-computing a gradient.


Was suggesting finite difference just for sanity check, not as an
actual substitute for the gradient. You wont believe how many times
the finite difference check has saved me from going in the exact
opposite direction !
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] automatic differentiation with PyAutoDiff

2012-06-14 Thread srean
> Of course, maybe you were pointing out that if your derivative
> calculation depends in some intrinsic way on the topology of some
> graph, then your best bet is to have an automatic way to recompute it
> from scratch for each new graph you see. In that case, fair enough!

That is indeed what I had in mind. In neural networks, Markov random
fields, Bayesian networks, graph regularization etc it is something
that has to be dealt with all the time.

> Right, and what I want is to do those correctness checks once, and
> then save the validated derivative function somewhere and know that it
> won't break the next time I upgrade some library or make some
> seemingly-irrelevant change to the original code.

Exactly.
What I was getting at is: even if it is not feasible to get a pretty
printed python output, the byte code can still be validated (somewhat)
with a few numeric sanity checks. So, yes the derivatives
needn't/shouldn't be re-computed in runtime all the time and an API
that that returns even some opaque but computable representation of
the derivative that can be validated and then "frozen" would be
helpful.

I think one can go further and formally prove the correctness of the
derivative computing engine. I dont know if anyone has done it. Maybe
Theano does it. Should be possible for a statically typed sublanguage.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] automatic differentiation with PyAutoDiff

2012-06-14 Thread srean
> Hi,
>
> I second James here, Theano do many of those optimizations. Only
> advanced coder can do better then Theano in most case, but that will
> take them much more time. If you find some optimization that you do
> and Theano don't, tell us. We want to add them :)
>
> Fred

I am sure Theano does an excellent job of expressions that matter. But
I think to get the best symbolic reduction of an expression is a hard,
as in, an AI hard problem. Correct me if I am wrong though.

 One can come up with perverse corner cases using algebraic or
trigonometric identities, expressions that are hundreds of terms long
but whose derivatives are simple, perhaps even a constant.

But all that matters is how well it does for the common cases and am
hearing that it does extremely well.

I will be happy if it can reduce simple things like  the following (a
very common form in Theano's domain)

\phi(x)  - \phi(y) - dot( x-y, \grad_phi(y))

evaluated for \phi(x) = \sum_i (x_i log x_i)  - x_i

to

\sum_i   x_i log(x_i / y_i) on  the set  sum(x) = sum(y) = 1

In anycase I think this is a digression and rather not pollute this
thread with peripheral (nonethless very interesting) issues.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Semantics of index arrays and a request to fix the user guide

2012-06-25 Thread srean
>From the user guide:
-

> Boolean arrays must be of the same shape as the array being indexed,
> or broadcastable to the same shape. In the most straightforward case,
>  the boolean array has the same shape.

Comment: So far so good, but the doc has not told me yet what is the
shape or the output.
--

user guide continues with an example:
--

> The result is a 1-D array containing all the elements in the indexed array 
> corresponding to all the true elements in the boolean array.


Comment:
--

Now it is not clear from that line whether the shape of the result is
generally true or is it specific to the example. So the reader (me) is
still confused.


User Guide continues:


> With broadcasting, multidimensional arrays may be the result. For example...

Comment:
--

I will get to the example in a minute, but there is no explanation of
the mechanism used to arrive at the output shape, is it the shape of
what the index array was broadcasted to ? or is it something else, if
it is the latter, what is it.

Example


The example indexes a (5,7) array with a (5,) index array. Now this
very confusing because it seems to contradict the original
documentation because
(5,) is neither the same shape as (5,7) nor is it broadcastable to it.

The steps of the conventional broaddcasting would yield

(5,7)
(5,)

then

(5,7)
(1,5)

then an error because 7 and 5 dont match.



User guide continues:
--

> Combining index arrays with slices.

> In effect, the slice is converted to an index array
> np.array([[1,2]]) (shape (1,2)) that is broadcast with
>  the index array to produce a resultant array of shape (3,2).

comment:
-

Here the two arrays have shape

(3,) and (1,2) so how does broadcasting yield the shape 3,2.
Broadcasting is supposed to proceed trailing dimension first but it
seems in these examples it is doing the opposite.

=

So could someone explain the semantics and make the user guide more precise.

Assuming the user guide will be the first document the new user will
read it is surprisingly difficult to read, primarily because it gets
into advanced topics to soon and partially because of ambiguous
language. The numpy reference on the other hand is very clear as is
Travis's book which I am glad to say I actually bought a long time
ago.

Thanks,
  srean
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Semantics of index arrays and a request to fix the user guide

2012-06-26 Thread srean
Hi All,

 my question might have got lost due to the intense activity around
the 1.7 release. Now that it has quietened down, would appreciate any
help regarding my confusion about how index arrays work
(especially when broadcasted).

-- srean


On Mon, Jun 25, 2012 at 5:29 PM, srean  wrote:
> From the user guide:
> -
>
>> Boolean arrays must be of the same shape as the array being indexed,
>> or broadcastable to the same shape. In the most straightforward case,
>>  the boolean array has the same shape.
>
> Comment: So far so good, but the doc has not told me yet what is the
> shape or the output.
> --
>
> user guide continues with an example:
> --
>
>> The result is a 1-D array containing all the elements in the indexed array 
>> corresponding to all the true elements in the boolean array.
>
>
> Comment:
> --
>
> Now it is not clear from that line whether the shape of the result is
> generally true or is it specific to the example. So the reader(me) is
> still confused.
>
>There is no explanation about
> the mechanism used to arrive at the output shape, is it the shape of
> what the index array was broadcasted to ? or is it something else, if
> it is the latter, what is it.
>
> Example
> 
>
> The example indexes a (5,7) array with a (5,) index array. Now this
> is confusing because it seems to contradict the original
> documentation because
> (5,) is neither the same shape as (5,7) nor is it broadcastable to it.
>
> The steps of the conventional broaddcasting would yield
>
> (5,7)
> (5,)
>
> then
>
> (5,7)
> (1,5)
>
> then an error because 7 and 5 dont match.
>
>
>
> User guide continues:
> --
>
>> Combining index arrays with slices.
>
>> In effect, the slice is converted to an index array
>> np.array([[1,2]]) (shape (1,2)) that is broadcast with
>>  the index array to produce a resultant array of shape (3,2).
>
> comment:
> -
>
> Here the two arrays have shape
> (3,) and (1,2) so how does broadcasting yield the shape 3,2.
> Broadcasting is supposed to proceed trailing dimension first but it
> seems in these examples it is doing the opposite.
>
> =
>
> So could someone explain the semantics and make the user guide more precise.
>
> Assuming the user guide will be the first document the new user will
> read it is surprisingly difficult to read, primarily because it gets
> into advanced topics to soon and partially because of ambiguous
> language. The numpy reference on the other hand is very clear as is
> Travis's book which I am glad to say I actually bought a long time
> ago.
>
> Thanks,
>  srean
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] memory allocation at assignment

2012-06-28 Thread srean
> Yes it does. If you want to avoid this extra copy, and have a
> pre-existing output array, you can do:
>
> np.add(a, b, out=c)
>
> ('+' on numpy array's is just a synonym for np.add; np.add is a ufunc,
> and all ufunc's accept this syntax:
>  http://docs.scipy.org/doc/numpy/reference/ufuncs.html
> )


Is the creation of the tmp as expensive as creation of a new numpy
array or is it somewhat  lighter weight (like being just a data
buffer). I sometimes use the c[:] syntax thinking I might benefit from
numpy.array re-use. But now I think that was misguided.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Meta: help, devel and stackoverflow

2012-06-28 Thread srean
Hi List,

 this has been brought up several times, and the response has been
generally positive but it has fallen through the cracks. So here are a
few repeat requests. Am keeping it terse just for brevity

i) Split the list into [devel] and [help] and as was mentioned
recently [rant/flame]:

   some request for help get drowned out during active development
related discussions and simple help requests pollutes more urgent
development related matters.

ii) Stackoverflow like site for help as well as for proposals.

The silent majority has been referred to a few times recently. I
suspect there does exist many lurkers on the list who do prefer one
discussed solution over the other but for various reasons do not break
out of their lurk mode to send a mail saying "I prefer this solution".
Such an interface will also help in keeping track of the level of
support as compared to mails that are larges hunks of quoted text with
a line or two stating ones preference or seconding a proposal.

One thing I have learned from traffic accidents is that if one asks
for a help of the assembled crowd, no one knows how to respond. On the
other hand if you say "hey there in a blue shirt could you get some
water"  you get instant results. So pardon me for taking the
presumptuous liberty to request Travis to please set it up or
delegate.

Splitting the lists shouldn't be hard work, setting up overflow might
be more work in comparison.

Best
-- srean
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Meta: help, devel and stackoverflow

2012-06-28 Thread srean
If I remember correctly there used to be a stackexchange site at
ask.scipy.org. It might be good to learn from that experience. I think
handling with spam was a significant problem, but am not sure whether
that is the reson why it got discontinued.

Best
  srean

On Thu, Jun 28, 2012 at 11:36 AM, Cera, Tim  wrote:
>
> A little more research shows that we could have a
> http://numpy.stackexchange.com.  The requirements are just to have people
> involved. See http://area51.stackexchange.com/faq for more info.
>
> Kindest regards,
> Tim
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Meta: help, devel and stackoverflow

2012-06-28 Thread srean
In case this changes your mind (or assuages fears) just wanted to
point out that many open source projects do this. It is not about
claiming that one is more important than the other, nor does it
reinforce the idea that developers and users live in separate silos,
but more of directing the mails to different folders. No policing is
required as well, just reply to the author and to the appropriate
list.

Right now reading numpy-discussion@scipy.org feels a lot like drinking
from a fire hydrant when a couple of threads become very active.

This is just anecdotal evidence, but I have had mails unanswered when
there is one or two threads that are dominating the list.

People are human and there will be situations where the top responders
will be overburdened and I think the split will mitigate the problem
somewhat. For whatever reasons, answering help requests are handled
largely by a small set of star responders, though I suspect the answer
is available more widely even among comparitively new users. I am
hoping (a) that with a separate "ask for help" such enlightened new
users can take up the slack (b) the information gets better organized
(c) we do not impose on users who are not so interested in devel
issues and vice versa. I take interest in devel related issues  (apart
from the distracting and what at times seem petty flamewars) and like
reading the numpy source, but dont think every user have similar
tastes neither should they.

Best
  Srean

On Thu, Jun 28, 2012 at 2:42 PM, Matthew Brett  wrote:
> Hi,
>
> On Thu, Jun 28, 2012 at 7:42 AM, Olivier Delalleau  wrote:
>> +1 for a numpy-users list without "dev noise".
>
> Moderately strong vote against splitting the mailing lists into devel and 
> user.
>
> As we know, this list can be unhappy and distracting, but I don't
> think splitting the lists is the right approach to that problem.
>
> Splitting the lists sends the wrong signal.  I'd rather that we show
> by example that the developers listen to all voices, and that the
> users should expect to become developers. In other words that the
> boundary between the user and developer is fluid and has no explicit
> boundaries.
>
> As data points, I make no distinction between scipy-devel and
> scipy-user, nor cython-devel and cython-user.  Policing the
> distinction ('please post this on the user mailing list') is a boring
> job and doesn't make anyone more cheerful.
>
> I don't believe help questions are getting lost any more than devel
> questions are, but I'm happy to be corrected if someone has some data.
>
> Cheers,
>
> Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Meta: help, devel and stackoverflow

2012-06-28 Thread srean
> And I continue to think it sends the wrong message.

Maybe if you articulate your fears I will be able to appreciate your
point of view more.

> My impression is that, at the moment, we numpy-ers are trying to work
> out what kind of community we are. Are we a developer community, or
> are we some developers who are users of a library that we rely on, but
> do not contribute to?

I think it is fair to extrapolate that all of us would want the numpy
community to grow. If that be so at some point not all of the users
will be developers. Apart from ones own pet projects, all successful
projects have more users than active developers.

 What I like about having two lists is that on one hand it does not
prevent me or you from participating in both, on the other hand it
allows those who dont want to delve too deeply in one aspect or the
other, the option of a cleaner inbox, or the option of having separate
inboxes. I for instance would like to be in both the lists, perhaps
mostly as a lurker, but still would want to have two different folders
just for better organization.

To me this seems a win win. There is also a chance that more lurkers
would speak up on the help list than here and I think that is a good
thing.

Best
  srean
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Meta: help, devel and stackoverflow

2012-06-28 Thread srean
Could not have said this better even if I tried, so thank you for your
long answer.

-- srean


On Thu, Jun 28, 2012 at 4:57 PM, Fernando Perez  wrote:

> Long answer, I know...
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Meta: help, devel and stackoverflow

2012-06-28 Thread srean
> I'm not on the python mailing lists, but my impression is that python
> is in a different space from numpy.  I mean, I have the impression

Indeed one could seek out philosphical differences between different
projects. No two projects are the same but they can and often do have
common issues. About the issues that Fernando mentioned I can say that
they are real, they do apply and this I say from a from the experience
of being on the numpy mailing list.

I think that many silent numpy users will thank the creation of a low
barrier, low noise (noise is context sensitive) forum where they can
ask for help with what they feel are simple questions with easy
answers.

I still do not have a tangible grasp of what your fears are. It seems
you are unhappy that this will split the community. It wont, its just
two lists for the same community where mails have been sorted into
different folders.

It also seems the notion of developers and users is disagreeable to
you and you are philosophically hesitant about accepting/recognizing
that such a difference exists. I may be wrong, I do not intend to
speak for you, I am only trying to understand your objections.

First let me assure you they are labels on (temporary) roles not on a
person (if that is what is making you uncomfortable). Different people
occupy different states for different amounts of time.

 A question about how to run length decode an array of integers is
very different from a question on which files to touch to add
reduceat( ) support to the numexpression engine and how.

It would be strange to take the position that there is no difference
between the nature of these questions. Or to take the position that
the person who is interest in the former is also keen to learn about
the former (note: some would be, example: yours sincerely. I know the
former ot the latter ) or at the least keen on receiving mails on
extended discussion on the topic of lesser interest.

 It seems to me, that sorting these mails into different bins only
improves the contextual signal to noise ratio, which the recipient can
use as he/she feels fit. The only issue is if there will be enough
volume for each of these bins. My perception is yes but this can
certainly be revisited.  In anycase it does not prevent nor hinder any
activity, but allows flexible organization of content should one want
it.

> So, it may not make sense to think in terms of a model that works for Python, 
> or even, IPython.

I do not want to read too much into this, but this I do find kind of
odd and confusing:  to proactively solicit input from other related
projects but then say that do do not apply once the views expressed
werent in total agreement.

This thread is coming close to veer into the
non-technical/non-productive/argumentative zone. The type that I am
fearful off, so I will stop here. But I would encourage you to churn
these views in your mind, impersonally, to see if the idea of
different lists have any merit and to seek out what are the tangible
harm that can come out of it.

I think this request has come before (hasten to add not initiated by
me) and the response had been largely been in favor, but nothing has
happened. So I would welcome information on: if indeed two lists are
to be made, who gets to create those lists

Best,
  srean
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Meta: help, devel and stackoverflow

2012-06-28 Thread srean
I like this solution and I think ask.scipy.org can be revived to take
over that role, but this will need some policing to send standard
questions there and also some hangout time at ask.scipy.org.

I love the stackoverflow model but it requires more active
participation of  those who want to answer questions as compared to
mailing lists. This because questions not only do not come to you by
default but they also  get knocked off the top page as more questions
come in. Something to watch out for though I believe it wont be as bad
as the main SO site.

Meta^2 I have been top posting with abandon here. Not sure what is
preferred here, top or bottom.

Best
  srean

On Thu, Jun 28, 2012 at 8:52 PM, T J  wrote:
> On Thu, Jun 28, 2012 at 3:23 PM, Fernando Perez 
> wrote:

> I'm okay with having two lists as it does filtering for me, but this seems
> like a sub-optimal solution.
>
> Observation: Some people would like to apply labels to incoming messages.
> Reality: Email was not really designed for that.
>
> We can hack it by using two different email addresses, but why not just keep
> this list as is and make a concentrated effort to promote the use of 2.0
> technologies, like stackoverflow/askbot/etc?  There, people can put as many
> tags as desired on questions: matrix, C-API, iteration, etc. Potentially,
> these tags would streamline everyone's workflow.  The stackoverflow setup
> also makes it easier for users to search for solutions to common questions,
> and know that the top answer is still an accurate answer.  [No one likes
> finding old invalid solutions.]  The reputation system and up/down votes
> also help new users figure out which responses to trust.
>
> As others have explained, it does seem that there are distinct types of
> discussions that take place on this list.
>
> 1)  There are community discussiuons/debates.
>
> Examples are the NA discussion, the bug tracker, release schedule, ABI/API
> changes, matrix rank tolerance too low, lazy evaluation, etc.   These are
> clearly mailing-list topics.   If you look at all the messages for the last
> two(!) months, it seems like this type of message has been the dominate
> type.
>
> 2) There are also standard questions.
>
> Recent examples are "memory allocation at assignment",  "dot() function
> question", "not expected output of fill_diagonal", "silly isscalar
> question".  These messages seem much more suited to the stackoverflow
> environment.  In fact, I'd be happy if we redirected such questions to
> stackoverflow.  This has the added benefit that responses to such questions
> will stay on topic.  Note that if a stackoverflow question seeds a
> discussion, then someone can start a new thread on the mailing list which
> cite the stackoverflow question.
>
> tl;dr
>
> Keep this list the same, and push "user" questions to stackoverflow instead
> of pushing them to a user list.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Meta: help, devel and stackoverflow

2012-06-30 Thread srean
On Sat, Jun 30, 2012 at 2:29 PM, John Hunter  wrote:

> This thread is a perfect example of why another list is needed.

+1

On Sat, Jun 30, 2012 at 2:37 PM, Matthew Brett  wrote:

> Oh - dear.   I think the point that most of us agreed on was that
> having a different from: address wasn't a perfect solution for giving
> people space for asking newbie type questions.  No-one has to read an
> email.  If it looks boring or silly or irrelevant to your concerns,
> well, then ignore it.

Looking at the same mails, it doesn't seem to me that most of us have
agreed on that. It seems most have us  have expressed that they will
be satisfied with two different lists but are open about considering
the stackoverflow model. The latter will require more work and time to
get it going copmpared to the former.

Aside:
A logical conclusion of your "dont read mails that dont interest you"
would be that spam is not a problem, after all  no one has to read
spam. If it looks boring or silly or irrelevant to your concerns,
well, then ignore it.


On Sat, Jun 30, 2012 at 1:57 PM, Dag Sverre Seljebotn
 wrote:

> http://news.ycombinator.com/item?id=4131462

It seems it was mostly driven an argumentative troll, who had decided
beforehand to disagree with some of the other folks and went about
cooking up interpretations so that he/she can complain about them.
Sadly, this list shows such tendencies at times as well.

Anecdotal data-point:
I have been  happy with SO in general. It works for certain types of
queries very well. OTOH if the answer to the question is known only to
a few and he/she does not happen to be online at  time the question
was posted, and he/she does not "pull" such possible questions by
key-words, that question is all but history.

The difference is that on a mailing list questions are "pushed" on to
people who might be able to answer it, whereas in SO model people have
to actively seek questions they want to answer. Unanticipated, niche
questions tend to disappear.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Meta: help, devel and stackoverflow

2012-06-30 Thread srean
> Isn't that what the various sections are for?

Indeed they are, but it still needs active "pulling" on behalf of
those who would want to answer questions and even then a question can
sink deep in the well. Deeper than what one typically monitors.
Sometimes question are not appropriately tagged. Sometimes it is not
obvious what the tag should be, or  which tag is being monitored by
the persons who might have the answer.

Could be less of a problem for us given that its a more focused group
and the predefined tags are not split too fine.

I think the main issue is that SO requires more active engagement than
a mailing list because checking for new mail has become something that
almost everyone does by default anyway.

Not saying SO is bad, I have benefited greatly from it, but this
issues should be kept in mind.

> http://stackoverflow.com/questions?sort=newest
> http://stackoverflow.com/questions?sort=unanswered
> And then, if you want modification-by-modification updates:
> http://stackoverflow.com/questions?sort=active

> Entries are sorted by date and you can view as many pages worth as are
> available.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Meta: help, devel and stackoverflow

2012-06-30 Thread srean
> You can subscribe to be notified by email whenever a question is posted
> to a certain tag.

Absolutely true.

>  So then it is no different than a mailing list as far
> as push/pull.

There are a few differences though. New tags get created often,
potentially in a decentralized fashion and dynamically, way more often
than creation of lists. Thats why the need to actively monitor.
Another is in frequency of subscription, how often does a user of SO
subscribe to a tag. Yet another is that tags are usually are much more
specific than a typical charter of a mailing list and thats a good
thing because it makes things easier to find nd browse.

I think if the tags are kept broad enough (or it is ensured that finer
tags inherit from broader tags. For example numpy.foo where foo can be
created according to the existing SO rules of tag creation ) and
participants here are willing to subscribe to those tags, there wont
be much of a difference. So, just two qualifiers.

In addition if there is a way to bounce-n-answer user questions
posted here to the SO forum relatively painlessy that will be quite
nice too. May be something that creates a new user based on user's
mail id, mails him/her the response and a password with which he/she
can take control of the id. It is more polite and may be a good way
for the SO site to collect more users.

Best
 --srean
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Array views

2011-03-26 Thread srean
Hi,

 I am also interested in this. In my application there is a large 2d array,
lets call it 'b' to keep the notation consistent in the thread.  b's
columns need to be recomputed often. Ideally this re-computation happens in
a function. Lets call that function updater(b, col_index): The simplest
example is where
updater(b, col_index) is a matrix vector multiply, where the matrix or the
vector changes.

 Is there anyway apart from using ufuncs that I can make updater() write the
result directly in b and not create a new temporary column that is then
copied into b ?  Say for the matrix vector multiply example.
I can write the matrix vector product in terms of ufuncs but will lose out
in terms of speed.

In the best case scenario I would like to maintain 'b' in a csr sparse
matrix form, as 'b' participates in a matrix vector multiply. I think csr
would be asking for too much, but even ccs should help.  I dont want to
clutter this thread with the sparsity issues though, any solution to the
original question or pointers to solutions would be appreciated.

Thanks
  --srean

On Sat, Mar 26, 2011 at 12:10 PM, Hugo Gagnon <
sourceforge.nu...@user.fastmail.fm> wrote:

> Hello,
>
> Say I have a few 1d arrays and one 2d array which columns I want to be
> the 1d arrays.
> I also want all the a's arrays to share the *same data* with the b
> array.
> If I call my 1d arrays a1, a2, etc. and my 2d array b, then
>
> b[:,0] = a1[:]
> b[:,1] = a2[:]
> ...
>
> won't work because apparently copying occurs.
> I tried it the other way around i.e.
>
> a1 = b[:,0]
> a2 = b[:,1]
> ...
>
> and it works but that doesn't help me for my problem.
> Is there a way to reformulate the first code snippet above but with
> shallow copying?
>
> Thanks,
> --
>  Hugo Gagnon
> --
>  Hugo Gagnon
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Array views

2011-03-26 Thread srean
Hi Christopher,

thanks for taking the time to reply at length. I do understand the concept
of striding in general but was not familiar with the  Numpy way of accessing
that information. So thanks for pointing me to .flag and .stride.

That said, BLAS/LAPACK do have apis that take the stride length into
account. But for sparse arrays I think its a hopeless situation. That is a
bummer, because sparse is what I need. Oh well, I will probably do it in C++

-- srean

p.s. I hope top posting is not frowned upon here. If so, I will keep that in
mind in my future posts.

On Sat, Mar 26, 2011 at 1:31 PM, Christopher Barker
wrote:

>
> Probably not -- the trick is that when an array is a view of a slice of
> another array, it may not be laid out in memory in a way that other libs
> (like LAPACK, BLAS, etc) require, so the data needs to be copied to call
> those routines.
>
> To understand all this, you'll need to study up a bit on how numpy
> arrays lay out and access the memory that they use: they use a concept
> of "strided" memory. It's very powerful and flexible, but most other
> numeric libs can't use those same data structures. I"m not sure what a
> good doc is to read to learn about this -- I learned it from messing
> with the C API. TAke a look at any docs that talk about "strides", and
> maybe playing with the "stride tricks" tools will help.
>
> A simple example:
>
> In [3]: a = np.ones((3,4))
>
> In [4]: a
> Out[4]:
> array([[ 1.,  1.,  1.,  1.],
>[ 1.,  1.,  1.,  1.],
>[ 1.,  1.,  1.,  1.]])
>
> In [5]: a.flags
> Out[5]:
>   C_CONTIGUOUS : True
>   F_CONTIGUOUS : False
>   OWNDATA : True
>   WRITEABLE : True
>   ALIGNED : True
>   UPDATEIFCOPY : False
>
> So a is a (3,4) array, stored in C_contiguous fashion, jsut like a
> "regular old C array". A lib expecting data in this fashion could use
> the data pointer just like regular C code.
>
> In [6]: a.strides
> Out[6]: (32, 8)
>
> this means is is 32 bytes from the start of one row to the next, and 8
> bytes from the start of one element to the next -- which makes sense for
> a 64bit double.
>
>
> In [7]: b = a[:,1]
>
> In [10]: b
> Out[10]: array([ 1.,  1.,  1.])
>
> so b is a 1-d array with three elements.
>
> In [8]: b.flags
> Out[8]:
>   C_CONTIGUOUS : False
>   F_CONTIGUOUS : False
>   OWNDATA : False
>   WRITEABLE : True
>   ALIGNED : True
>   UPDATEIFCOPY : False
>
> but it is NOT C_Contiguous - the data is laid out differently that a
> standard C array.
>
> In [9]: b.strides
> Out[9]: (32,)
>
> so this means that it is 32 bytes from one element to the next -- for a
> 8 byte data type. This is because the elements are each one element in a
> row of the a array -- they are not all next to each other. A regular C
> library generally won't be able to work with data laid out like this.
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Array views

2011-03-26 Thread srean
Ah! very nice. I did not know that numpy-1.6.1 supports in place 'dot', and
neither the fact that you could access the underlying BLAS functions like
so. This is pretty neat. Thanks. Now I at least have an idea how the sparse
version might work.

If I get time I will probably give numpy-1.6.1 a shot. I already have the
MKL libraries thanks to free version of epd for students.


On Sat, Mar 26, 2011 at 2:34 PM, Pauli Virtanen  wrote:

>
> Like so:
>
># Fortran-order for efficient DGEMM -- each column must be contiguous
>A = np.random.randn(4,4).copy('F')
>b = np.random.randn(4,10).copy('F')
>
>def updater(b, col_idx):
> # This will work in Numpy 1.6.1
>dot(A, b[:,col_idx].copy(), out=b[:,col_idx])
>
> In the meantime you can do
>
>A = np.random.randn(4,4).copy('F')
>b = np.random.randn(4,10).copy('F')
>
>from scipy.lib.blas import get_blas_funcs
>gemm, = get_blas_funcs(['gemm'], [A, b]) # get correct type func
>
>def updater(b, col_idx):
> bcol = b[:,col_idx]
>c = gemm(1.0, A, bcol.copy(), 0.0, bcol, overwrite_c=True)
>assert c is bcol # check that it didn't make copies!
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Array views

2011-03-26 Thread srean
On Sat, Mar 26, 2011 at 3:16 PM, srean  wrote:

>
> Ah! very nice. I did not know that numpy-1.6.1 supports in place 'dot',
>

In place is perhaps not the right word, I meant "in a specified location"
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Shared memory ndarrays (update)

2011-04-11 Thread srean
Hi everyone,

  I was looking up the options that are available for shared memory arrays
and this thread came up at the right time. The doc says that multiprocessing
.Array(...) gives a shared memory array. But from the code it seems to me
that it is actually using a mmap. Is that correct a correct assessment, and
if so, is there any advantage in using multiprocessing.Array(...) over
simple numpy mmaped arrays.

Regards
  srean
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Shared memory ndarrays (update)

2011-04-11 Thread srean
Apologies for adding to my own post. multiprocessing.Array(...) uses an
anonymous mmapped file. I am not sure if that means it is resident on RAM or
the swap device. But my original question remains, what are the pros and
cons of using it versus numpy mmapped arrays. If  multiprocessing.Array is
indeed resident in memory (subject to swapping of course) that would still
be advatageous compared to a file mapped from a on-disk filesystem.

On Mon, Apr 11, 2011 at 12:42 PM, srean  wrote:

> Hi everyone,
>
>   I was looking up the options that are available for shared memory arrays
> and this thread came up at the right time. The doc says thatmultiprocessing
> .Array(...) gives a shared memory array. But from the code it seems to me
> that it is actually using a mmap. Is that correct a correct assessment, and
> if so, is there any advantage in using multiprocessing.Array(...) over
> simple numpy mmaped arrays.
>
> Regards
>   srean
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Shared memory ndarrays (update)

2011-04-11 Thread srean
Got you and thanks a lot for the explanation. I am not using Queues so I
think I am safe for the time being.  Given that you have worked a lot on
these issues, would you recommend plain mmapped numpy arrays over
multiprocessing.Array

Thanks again

-- srean

On Mon, Apr 11, 2011 at 1:36 PM, Sturla Molden  wrote:

>  "Shared memory" is memory mapping from the paging file (i.e. RAM), not a
> file on disk. They can have a name or be anonymous. I have explained why we
> need named shared memory before. If you didn't understand it, try to pass an
> instance of multiprocessing.Array over multiprocessing.Queue.
>
> Sturla
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ufunc 's order of execution [relevant when output overlaps with input]

2011-05-12 Thread srean
Hi,

  is there a guarantee that ufuncs will execute left to right and in
sequential order ? For instance is the following code standards compliant ?

>>> import numpy as n
>>> a=n.arange(0,5)
array([0, 1, 2, 3, 4])
>>> n.add(a[0:-1], a[1:], a[0:-1])
array([1, 3, 5, 7])

The idea was to reuse and hence save space. The place where I write to is
not accessed again.

I am quite surprised that the following works correctly.

>>>n.add.accumulate(a,out=a)

I guess it uses a buffer and rebinds `a` to that buffer at the end. It is
also faster than

>>> n.add(a[0:-1], a[1:], a[1:])

--sean
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ufunc 's order of execution [relevant when output overlaps with input]

2011-05-12 Thread srean
> It is possible that we can make an exception for inputs and outputs
> that overlap each other and pick a standard traversal. In those cases,
> the order of traversal can affect the semantics,


Exactly. If there is no overlap then it does not matter and can potentially
be done in parallel. On the other hand if there is some standardized
traversal that might allow one to write nested loops compactly. I dont
really need it, but found the possibility quite intriguing.

It always reads from a[i] before it writes to out[i], so it's always
> consistent.
>

Ah I see, thanks. Should have seen through it.

--sean
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Adding the arrays in an array iterator

2011-05-27 Thread srean
Hi List,

 I have to sum up an unknown number of ndarrays of the same size. These
arrays, possibly thousands in number, are provided by an iterator. Right now
I use python reduce with operator.add. Does that invoke the corresponding
ufunc internally, I want to avoid creating temporaries. With ufunc I know
how to do it, but not sure how to make use of that in reduce. It is not
essential that I use reduce though, so I would welcome idiomatic and
efficient way of executing this. So far I have stayed away from building an
ndarray object and summing across the relevant dimension. Is that what I
should be doing ?  Different invocations of this function has different
number of arrays, so I cannot pre-compile away this into a numexpr.

Thanks and regards
  srean
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] [Repost] Adding the arrays returned in an array iterator

2011-06-02 Thread srean
Bumping my question tentatively. I am fairly sure there is a good answer and
for some reason it got overlooked.

Regards
  srean

-- Forwarded message --
From: srean 
Date: Fri, May 27, 2011 at 10:36 AM
Subject: Adding the arrays in an array iterator
To: Discussion of Numerical Python 


Hi List,

 I have to sum up an unknown number of ndarrays of the same size. These
arrays, possibly thousands in number, are provided via an iterator. Right
now I use python reduce with operator.add.

 Does that invoke the corresponding ufunc internally ? I want to avoid
creating temporaries, which I suspect a naive invocation of reduce will
create.  With ufunc I know how to avoid making copies using the output
parameter, but not sure how to make use of that in reduce.

 It is not essential that I use reduce though, so I would welcome idiomatic
and efficient way of executing this. So far I have stayed away from building
an ndarray object and summing across the relevant dimension. Is that what I
should be doing ?  Different invocations of this function has different
number of arrays, so I cannot pre-compile away this into a numexpr.

Thanks and regards
  srean
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [Repost] Adding the arrays returned in an array iterator

2011-06-03 Thread srean
> If they are in a list, then I would do something like

Apologies if it wasnt clear in my previous mail. The arrays are in a
lazy iterator, they are non-contiguous and there are several thousands
of them. I was hoping there was a way to get at a "+=" operator for
arrays to use in a reduce. Seems like indeed there is. I had missed
operator.iadd()

> result = arrays[0].copy()
> for a in arrays[1:]:
> result += a
>
> But much depends on the details of your problem.

> Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication

2011-06-13 Thread srean
Looking at the code the arrays that you are multiplying seem fairly
small (300, 200) and you have 50 of them. So it might the case that
there is not enough computational work to compensate for the cost of
forking new processes and communicating the results. Have you tried
larger arrays and more of them ?

If you are on an intel machine and you have MKL libraries around I
would strongly recommend that you use the matrix multiplication
routine if possible. MKL will do the parallelization for you. Well,
any good BLAS implementation would do the same, you dont really need
MKL. ATLAS and ACML would work too, just that MKL has been setup for
us and it works well.

To give an idea, given the amount of tuning and optimization that
these libraries have undergone a numpy.sum would be slower that an
multiplication with a vector of all ones. So in the interest of speed
the longer you stay in the BLAS context the better.

--srean

On Fri, Jun 10, 2011 at 10:01 AM, Brandt Belson  wrote:
> Unfortunately I can't flatten the arrays. I'm writing a library where the
> user supplies an inner product function for two generic objects, and almost
> always the inner product function does large array multiplications at some
> point. The library doesn't get to know about the underlying arrays.
> Thanks,
> Brandt
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] (cumsum, broadcast) in (numexpr, weave)

2011-06-21 Thread srean
Hi All,

 is there a fast way to do cumsum with numexpr ? I could not find it,
but the functions available in numexpr does not seem to be
exhaustively documented, so it is possible that I missed it. Do not
know if 'sum' takes special arguments that can be used.

To try another track, does numexpr operators have something like the
'out' parameter for ufuncs ? If it is so, one could perhaps use
add( a[0:-1], a[1,:], out = a[1,:) provided it is possible to preserve
the sequential semantics.

Another option is to use weave which does have cumsum. However my code
requires  expressions which implement broadcast. That leads to my next
question, does repeat or concat return a copy or a view. If they avoid
copying, I could perhaps use repeat to simulate efficient
broadcasting. Or will it make a copy of that array anyway ?. I would
ideally like to use numexpr because I make heavy use of transcendental
functions and was hoping to exploit the VML library.

Thanks for the help

-- srean
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] (cumsum, broadcast) in (numexpr, weave)

2011-06-21 Thread srean
Apologies, intended to send this to the scipy list.

On Tue, Jun 21, 2011 at 2:35 PM, srean  wrote:
> Hi All,
>
>  is there a fast way to do cumsum with numexpr ? I could not find it,
> but the functions available in numexpr does not seem to be
> exhaustively documented, so it is possible that I missed it. Do not
> know if 'sum' takes special arguments that can be used.
>
> To try another track, does numexpr operators have something like the
> 'out' parameter for ufuncs ? If it is so, one could perhaps use
> add( a[0:-1], a[1,:], out = a[1,:) provided it is possible to preserve
> the sequential semantics.
>
> Another option is to use weave which does have cumsum. However my code
> requires  expressions which implement broadcast. That leads to my next
> question, does repeat or concat return a copy or a view. If they avoid
> copying, I could perhaps use repeat to simulate efficient
> broadcasting. Or will it make a copy of that array anyway ?. I would
> ideally like to use numexpr because I make heavy use of transcendental
> functions and was hoping to exploit the VML library.
>
> Thanks for the help
>
> -- srean
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] How to avoid extra copying when forming an array from an iterator

2011-06-24 Thread srean
Hi,

I have a iterator that yields a complex object. I want to make an array out
of a numerical attribute that the yielded object possesses and that too very
efficiently.

My initial plan was to keep writing the numbers to a StringIO object and
when done generate the numpy array using StringIO's buffer. But fromstring()
method fails on the StringIO object, so does fromfile() or using the
StringIO object as the initializer of an array object.

After some digging I ran in to this ticket htt
projects.scipy.org/numpy/ticket/1634 that has been assigned a low priority.

Is there some other way to achieve what I am trying ? Efficiency is
important because potentially millions of objects would be yielded.

-- srean
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to avoid extra copying when forming an array from an iterator

2011-06-24 Thread srean
To answer my own question, I guess I can keep appending to a array.array()
object and get a numpy.array from its buffer if possible.  Is that the
efficient way.

On Fri, Jun 24, 2011 at 2:35 AM, srean  wrote:

> Hi,
>
> I have an iterator that yields a complex object. I want to make an array
> out of a numerical attribute that the yielded object possesses and that too
> very efficiently.
>
> My initial plan was to keep writing the numbers to a StringIO object and
> when done generate the numpy array using StringIO's buffer. But fromstring()
> method fails on the StringIO object, so does fromfile() or using the
> StringIO object as the initializer of an array object.
>
> After some digging I ran in to this ticket htt
> projects.scipy.org/numpy/ticket/1634 that has been assigned a low
> priority.
>
> Is there some other way to achieve what I am trying ? Efficiency is
> important because potentially millions of objects would be yielded.
>
> -- srean
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to avoid extra copying when forming an array from an iterator

2011-06-24 Thread srean
On Fri, Jun 24, 2011 at 9:12 AM, Robert Kern  wrote:

> On Fri, Jun 24, 2011 at 04:03, srean  wrote:
> > To answer my own question, I guess I can keep appending to a
> array.array()
> > object and get a numpy.array from its buffer if possible.  Is that the
> > efficient way.
>
> It's one of the most efficient ways to do it, yes, especially for 1D
> arrays.
>


Thanks for the reply. My first cut was to try cStringIO because I thought I
could use the writelines() method and hence avoid the for loop in the python
code.  It would have been nice if that had worked. If I understood it
correctly the looping would have been in a C function call.

regards
  srean
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to avoid extra copying when forming an array from an iterator

2011-06-24 Thread srean
A valiant exercise in hope:

Is this possible to do it without a loop or extra copying. What I have is an
iterator that yields a fixed with string on every call to next(). Now I want
to create a numpy array of ints out of the last 4 chars of that string.

My plan was to pass the iterator through a generator that returned an
iterator over the last 4 chars. (sub question: given that strings are
immutable, is it possible to yield a view of the last 4 chars rather than a
copy). Then apply StringIO.writelines() on the 2-char iterator returned.
After its done, create a numpy.array from the StringIO's buffer.

This does not work, the other option is to use an array.array in place of a
StringIO object. But is it possible to fill an array.array using a lazy
iterator without an explicit loop in python. Something like the writelines()
call

I know premature optimization and all that, but this indeed needs to be done
efficiently

Thanks again for your gracious help

-- srean

On Fri, Jun 24, 2011 at 9:12 AM, Robert Kern  wrote:

> On Fri, Jun 24, 2011 at 04:03, srean  wrote:
> > To answer my own question, I guess I can keep appending to a
> array.array()
> > object and get a numpy.array from its buffer if possible.  Is that the
> > efficient way.
>
> It's one of the most efficient ways to do it, yes, especially for 1D
> arrays.
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Array vectorization in numpy

2011-07-20 Thread srean
>> I think this is essential to speed up numpy. Maybe numexpr could handle this 
>> in the future? Right now the general use of numexpr is result = 
>> numexpr.evaluate("whatever"), so the same problem seems to be there.
>>
>> With this I am not saying that numpy is not worth it, just that for many 
>> applications (specially with huge matrices/arrays), pre-allocation does make 
>> a huge difference, especially if we want to attract more people to using 
>> numpy.
>
> The ufuncs and many scipy functions take a "out" parameter where you
> can specify a pre-allocated array.  It can be a little awkward writing
> expressions that way, but the capability is there.

This is a slight digression: is there a way to have a out parameter
like semantics with numexpr. I have always used it as

a[:] = numexpr(expression)

But I dont think numexpr builds the value in place. Is it possible to
have side-effects with numexpr as opposed to obtaining values, for
example

"a= a * b + c"

The documentation is not clear about this. Oh and I do not find the
"out" parameter awkward at all. Its very handy. Furthermore, if I may,
here is a request that the Blitz++ source be updated. Seems like there
is a lot of activity on the Blitz++ repository and weave is very handy
too and can be used as easily as numexpr.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Array vectorization in numpy

2011-07-21 Thread srean
>> This is a slight digression: is there a way to have a out parameter
>> like semantics with numexpr. I have always used it as
>>
>> a[:] = numexpr(expression)


> In order to make sure the 1.6 nditer supports multithreading, I adapted
> numexpr to use it. The branch which does this is here:
> http://code.google.com/p/numexpr/source/browse/#svn%2Fbranches%2Fnewiter
> This supports out, order, and casting parameters, visible here:
> http://code.google.com/p/numexpr/source/browse/branches/newiter/numexpr/necompiler.py#615
> It's pretty much ready to go, just needs someone to do the release
> management.
> -Mark


Oh excellent, I did not know that the out parameter was available.
Hope this gets in soon.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] c-info.ufunc-tutorial.rst

2011-08-24 Thread srean
Hi,

I was reading this document,
https://github.com/numpy/numpy/blob/master/doc/source/user/c-info.ufunc-tutorial.rst

its well written and there is a good build up to exciting code examples that
are coming, but I do not see the actual examples, only how they may be used.
Is it located somewhere else and not linked? or is it that the
c-info.ufunc-tutorial.rst document is incomplete and the examples have not
been written. I suspect the former. In that case could anyone point to the
code examples and may be also update the c-info.ufunc-tutorial.rst document.

Thanks

-- srean
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] c-info.ufunc-tutorial.rst

2011-08-24 Thread srean
Following up on my own question: I can see the code in the commit. So it
appears that

code-block::

Are not being rendered correctly. Could anyone confirm ? In case it is my
browser alone, though I did try after disabling no-script.

On Wed, Aug 24, 2011 at 6:53 PM, srean  wrote:

> Hi,
>
> I was reading this document,
> https://github.com/numpy/numpy/blob/master/doc/source/user/c-info.ufunc-tutorial.rst
>
> its well written and there is a good build up to exciting code examples
> that are coming, but I do not see the actual examples, only how they may be
> used. Is it located somewhere else and not linked? or is it that the
> c-info.ufunc-tutorial.rst document is incomplete and the examples have not
> been written. I suspect the former. In that case could anyone point to the
> code examples and may be also update the c-info.ufunc-tutorial.rst document.
>
> Thanks
>
> -- srean
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] c-info.ufunc-tutorial.rst

2011-08-24 Thread srean
Thanks Anthony and Mark, this is good to know.

So what would be the advised way of looking at freshly baked documentation ?
Just look at the raw files ? or is there some place else where the correct
sphinx rendered docs are hosted.

On Wed, Aug 24, 2011 at 7:19 PM, Anthony Scopatz  wrote:

> code-block:: is a directive that I think might be specific to sphinx.
>  Naturally, github's renderer will drop it.
>
> On Wed, Aug 24, 2011 at 7:10 PM, Mark Wiebe  wrote:
>
>>
>> I believe this is because of github's .rst processor which simply drops
>> blocks it can't understand. When building NumPy documentation, many more
>> extensions and context exists. I'm getting the same thing in the C-API
>> NA-mask documentation I just posted.
>>
>> -Mark
>>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] the build and installation process

2011-08-25 Thread srean
Hi,

 I would like to know a bit about how the installation process works. Could
you point me to a resource. In particular I want to know how the site.cfg
configuration works. Is it numpy/scipy specific or is it standard with
distutils. I googled for site.cfg and distutils but did not find any
authoritative document.

I believe many new users trip up on the installation process, especially in
trying to substitute their favourite library in place os the standard. So a
canonical document explaining the process will be very helpful.

http://docs.scipy.org/doc/numpy/user/install.html

does cover some of the important points but its a bit sketchy, and has a
"this is all that you need to know" flavor. Doesnt quite enable the reader
to fix his own problems. So a resource that is somewhere in between reading
up all the sources that get invoked during the installation and building,
and the current install document will be very welcome.

English is not my native language, but if there is anyway I can help, I
would do so gladly.

-- srean
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: Numexpr 2.0 released

2011-12-13 Thread srean
This is great news, I hope this gets included in the epd distribution soon.

I had mailed a few questions about numexpr sometime ago. I am still
curious about those. I have included the relevant parts below. In
addition, I have another question. There was a numexpr branch that
allows a "out=blah" parameer to build the output in place, has that
been merged or its functionality incorporated ?

This goes without saying, but, thanks for numexpr.

--  from old mail --

What I find somewhat encumbering is that there is no single piece of
document that lists all the operators and functions that numexpr can
parse. For a new user this will be very useful There is a list in the
wiki page entitled "overview" but it seems incomplete (for instance it
does not describe the reduction operations available). I do not know
enough to know how incomplete it is.

Is there any plan to implement the reduction like enhancements that
ufuncs provide: namely reduce_at, accumulate, reduce ? It is entirely
possible that they are already in there but I could not figure out how
to use them. If they aren't it would be great to have them.

On Sun, Nov 27, 2011 at 7:00 AM, Francesc Alted  wrote:
>
> 
>  Announcing Numexpr 2.0
> 
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering

2013-04-16 Thread srean
As one lurker to another, thanks for calling it out.

Over-argumentative, and personality centric threads like these have
actually led me to distance myself from the numpy community. I do not know
how common it is now because I do not follow it closely anymore. It used to
be quite common at one point in time. I came down to check after a while,
and lo there it is again.

If a mail is put forward as a question "i find this confusing, is it
confusing for you", it ought not to devolve into a shouting match atop
moral high-horses "so you think I am stupid do you?  too smart are you ?
how dare you express that it doesnt bother you as much when it bothers me
and my documented case of 4 people. I have four, how many do you have"

If something is posed as a question one should be open to the answers.
Sometimes it is better not to pose it a question at all but offer
alternatives and ask for preference.

I am not siding with any of the technical options provided, just requesting
that the discourse not devolve into these personality oriented contests. It
gets too loud and noisy.

Thank you



On Sat, Apr 6, 2013 at 12:18 PM, matti picus  wrote:

> as a lurker, may I say that this discussion seems to have become
> non-productive?
>
> It seems all agree that docs needs improvement, perhaps a first step would
> be to suggest doc improvements, and then the need for renaming may become
> self-evident, or not.
>
> aww darn, ruined my lurker status.
> Matti Picus
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] repeat an array without allocation

2014-05-04 Thread srean
Hi all,

  is there an efficient way to do the following without allocating A where

 A = np.repeat(x, [4, 2, 1, 3], axis=0)
 c = A.dot(b)# b.shape

thanks
-- srean
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] repeat an array without allocation

2014-05-05 Thread srean
Great ! thanks. I should have seen that.

Is there any way array multiplication (as opposed to matrix multiplication)
can be sped up without forming A and (A * b) explicitly.

A = np.repeat(x, [4, 2, 1, 3], axis = 0)# A.shape == 10,10
c = sum(b * A, axis = 1)# b.shape == 10,10

In my actual setting b is pretty big, so I would like to avoid creating
another array the same size. I would also like to avoid a Python loop.

st = 0
for (i,rep) in enumerate([4, 2, 1, 3]):
 end = st + rep
 c[st : end] = np.dot(b[st : end, :], a[i,:])
 st  = end

Is Cython the only way ?


On Mon, May 5, 2014 at 1:20 AM, Jaime Fernández del Río <
jaime.f...@gmail.com> wrote:

> On Sun, May 4, 2014 at 9:34 PM, srean  wrote:
>
>> Hi all,
>>
>>   is there an efficient way to do the following without allocating A where
>>
>>  A = np.repeat(x, [4, 2, 1, 3], axis=0)
>>  c = A.dot(b)# b.shape
>>
>
> If x is a 2D array you can call repeat **after** dot, not before, which
> will save you some memory and a few operations:
>
> >>> a = np.random.rand(4, 5)
> >>> b = np.random.rand(5, 6)
> >>> np.allclose(np.repeat(a, [4, 2, 1, 3], axis=0).dot(b),
> ... np.repeat(a.dot(b), [4, 2, 1, 3], axis=0))
> True
>
> Similarly, if x is a 1D array, you can sum the corresponding items of b
> before calling dot:
>
> >>> a = np.random.rand(4)
> >>> b = np.random.rand(10)
> >>> idx = np.concatenate(([0], np.cumsum([4,2,1,3])[:-1]))
> >>> np.allclose(np.dot(np.repeat(a, [4,2,1,3] ,axis=0), b),
> ... np.dot(a, np.add.reduceat(b, idx)))
> ... )
> True
>
> Jaime
>
> --
> (\__/)
> ( O.o)
> ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
> de dominación mundial.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Shared memory check on in-place modification.

2015-08-07 Thread srean
Wait, when assignments and slicing mix wasn't the behavior supposed to be
equivalent to copying the RHS to a temporary and then assigning using the
temporary. Is that a false memory ? Or has the behavior changed ? As long
as the behavior is well defined and succinct it should be ok


On Tuesday, July 28, 2015, Sebastian Berg 
wrote:

>
> On Mon Jul 27 22:51:52 2015 GMT+0200, Sturla Molden wrote:
> > On 27/07/15 22:10, Anton Akhmerov wrote:
> > > Hi everyone,
> > >
> > > I have encountered an initially rather confusing problem in a piece of
> > > code that attempted to symmetrize a matrix: `h += h.T`
> > > The problem of course appears due to `h.T` being a view of `h`, and
> > > some elements being overwritten during the __iadd__ call.
> >
>
> I think the typical proposal is to raise a warning. Note there is
> np.may_share_memoty. But the logic to give the warning is possibly not
> quite easy, since this is ok to use sometimes. If someone figures it out
> (mostly) I would be very happy zo see such warnings.
>
>
> > Here is another example
> >
> >  >>> a = np.ones(10)
> >  >>> a[1:] += a[:-1]
> >  >>> a
> > array([ 1.,  2.,  3.,  2.,  3.,  2.,  3.,  2.,  3.,  2.])
> >
> > I am not sure I totally dislike this behavior. If it could be made
> > constent it could be used to vectorize recursive algorithms. In the case
> > above I would prefer the output to be:
> >
> > array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.,  10.])
> >
> > It does not happen because we do not enforce that the result of one
> > operation is stored before the next two operands are read. The only way
> > to speed up recursive equations today is to use compiled code.
> >
> >
> > Sturla
> >
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org 
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org 
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Shared memory check on in-place modification.

2015-08-07 Thread srean
I got_misled_by (extrapolated erroneously from) this description of
temporaries in the documentation

http://docs.scipy.org/doc/numpy/user/basics.indexing.html#assigning-values-to-indexed-arrays
,,,])]" ... new array is extracted from the original (as a temporary)
containing the values at 1, 1, 3, 1, then the value 1 is added to the
temporary, and then the temporary is assigned back to the original array.
Thus the value of the array at x[1]+1 is assigned to x[1] three times,
rather than being incremented 3 times."

It is talking about a slightly different scenario of course, the temporary
corresponds to the LHS. Anyhow, as long as the behavior is defined
rigorously it should not be a problem. Now, I vaguely remember abusing
ufuncs and aliasing in interactive sessions for some weird cumsum like
operations (I plead bashfully guilty).


On Fri, Aug 7, 2015 at 1:38 PM, Sebastian Berg 
wrote:

> On Fr, 2015-08-07 at 13:14 +0530, srean wrote:
> > Wait, when assignments and slicing mix wasn't the behavior supposed to
> > be equivalent to copying the RHS to a temporary and then assigning
> > using the temporary. Is that a false memory ? Or has the behavior
> > changed ? As long as the behavior is well defined and succinct it
> > should be ok
> >
>
> No, NumPy has never done that as far as I know. And since SIMD
> instructions etc. make this even less predictable (you used to be able
> to abuse in-place logic, even if usually the same can be done with
> ufunc.accumulate so it was a bad idea anyway), you have to avoid it.
>
> Pauli is working currently on implementing the logic needed to find if
> such a copy is necessary [1] which is very cool indeed. So I think it is
> likely we will such copy logic in NumPy 1.11.
>
> - Sebastian
>
>
> [1] See https://github.com/numpy/numpy/pull/6166 it is not an easy
> problem.
>
>
> > On Tuesday, July 28, 2015, Sebastian Berg 
> > wrote:
> >
> >
> >
> > On Mon Jul 27 22:51:52 2015 GMT+0200, Sturla Molden wrote:
> > > On 27/07/15 22:10, Anton Akhmerov wrote:
> > > > Hi everyone,
> > > >
> > > > I have encountered an initially rather confusing problem
> > in a piece of
> > > > code that attempted to symmetrize a matrix: `h += h.T`
> > > > The problem of course appears due to `h.T` being a view of
> > `h`, and
> > > > some elements being overwritten during the __iadd__ call.
> > >
> >
> > I think the typical proposal is to raise a warning. Note there
> > is np.may_share_memoty. But the logic to give the warning is
> > possibly not quite easy, since this is ok to use sometimes. If
> > someone figures it out (mostly) I would be very happy zo see
> > such warnings.
> >
> >
> > > Here is another example
> > >
> > >  >>> a = np.ones(10)
> > >  >>> a[1:] += a[:-1]
> > >  >>> a
> > > array([ 1.,  2.,  3.,  2.,  3.,  2.,  3.,  2.,  3.,  2.])
> > >
> > > I am not sure I totally dislike this behavior. If it could
> > be made
> > > constent it could be used to vectorize recursive algorithms.
> > In the case
> > > above I would prefer the output to be:
> > >
> > > array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.,  10.])
> > >
> > > It does not happen because we do not enforce that the result
> > of one
> > > operation is stored before the next two operands are read.
> > The only way
> > > to speed up recursive equations today is to use compiled
> > code.
> > >
> > >
> > > Sturla
> > >
> > >
> > > ___
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion@scipy.org
> > > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> > >
> > >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] automatically avoiding temporary arrays

2016-10-05 Thread srean
Thanks Francesc, Robert for giving me a broader picture of where this fits
in. I believe numexpr does not  handle slicing, so that might be another
thing to look at.


On Wed, Oct 5, 2016 at 4:26 PM, Robert McLeod  wrote:

>
> As Francesc said, Numexpr is going to get most of its power through
> grouping a series of operations so it can send blocks to the CPU cache and
> run the entire series of operations on the cache before returning the block
> to system memory.  If it was just used to back-end NumPy, it would only
> gain from the multi-threading portion inside each function call.
>

Is that so ?

I thought numexpr also cuts down on number of temporary buffers that get
filled (in other words copy operations) if the same expression was written
as series of operations. My understanding can be wrong, and would
appreciate correction.

The 'out' parameter in ufuncs can eliminate extra temporaries but its not
composable. Right now I have to manually carry along the array where the in
place operations take place. I think the goal here is to eliminate that.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] automatically avoiding temporary arrays

2016-10-06 Thread srean
On Wed, Oct 5, 2016 at 5:36 PM, Robert McLeod  wrote:

>
> It's certainly true that numexpr doesn't create a lot of OP_COPY
> operations, rather it's optimized to minimize them, so probably it's fewer
> ops than naive successive calls to numpy within python, but I'm unsure if
> there's any difference in operation count between a hand-optimized numpy
> with out= set and numexpr.  Numexpr just does it for you.
>

That was my understanding as well. If it automatically does what one could
achieve by carrying the state along in the 'out' parameter, that's as good
as it can get in terms removing unnecessary ops. There are other speedup
opportunities of course, but that's a separate matter.


> This blog post from Tim Hochberg is useful for understanding the
> performance advantages of blocking versus multithreading:
>
> http://www.bitsofbits.com/2014/09/21/numpy-micro-optimization-and-numexpr/
>

Hadnt come across that one before. Great link. Thanks. using caches and
vector registers well trumps threading, unless one has a lot of data and it
helps to disable hyper-threading.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion