[Numpy-discussion] Re: feature request: N-D Gaussian function (not random distribution)

2024-07-21 Thread Joseph Fox-Rabinovitz
There's also an implementation in scikit-guess, which I mostly maintain.

On Sun, Jul 21, 2024, 00:38 Dom Grigonis  wrote:

> For statistics functions there is `scipy` package.
>
> If you are referring to pdf of n-dimensional gaussian distribution,
> `scipy.stats.multivariate_normal.pdf` should do the trick.
>
> If you are referring to something else, then a bit of clarification would
> be helpful.
>
> Regards,
> dg
>
> > On 20 Jul 2024, at 09:04, tomnewton...@gmail.com wrote:
> >
> > Hello,
> >
> > Apologies if either (a) this is the wrong place to post this or (b) this
> functionality already exists and I didn't manage to find it.
> >
> > I have found myself many times in the past wishing that some sort of N-D
> Gaussian function exists in NumPy. For example, when I wish to test that
> some plot or data analysis method is working correctly, being able to call
> `np.gauss()` on a (M,N) array of coordinates or passing it the arrays
> generated by a meshgrid, along with tuples for sigma and mu, would be very
> convienient.
> >
> > I could write such a function myself, but that would not be convenient
> to reuse (copying the function to every program/project I want to use it
> in), and many other mathematical functions have "convenience" functions in
> NumPy (such as the square root). More importantly, I imagine that any such
> function that was written into NumPy by the people who regularly contribute
> to the project would be far better than one I wrote myself, as I am not
> tremendously good at programming.
> >
> > Regards,
> > Tom
> > ___
> > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > To unsubscribe send an email to numpy-discussion-le...@python.org
> > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > Member address: dom.grigo...@gmail.com
>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: jfoxrabinov...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


Re: [Numpy-discussion] Array blitting (pasting one array into another)

2017-06-29 Thread Joseph Fox-Rabinovitz
This is a useful idea certainly. I would recommended extending it to an
arbitrary number of axes. You could either raise an error if the ndim of
the two arrays are unequal, or allow a broadcast of a lesser ndimmed src
array.

- Joe

On Jun 29, 2017 20:17, "Mikhail V"  wrote:

> Hello all
>
> I often need to copy one array into another array, given an offset.
> This is how the "blit" function can be understood, i.e. in
> every graphical lib there is such a function.
> The common definition is like:
> blit ( dest, src, offset ):
> where dest is destination array, src is source array and offset is
> coordinates in destination where the src should pe blitted.
> Main feature of such function is that it never gives an error,
> so if the source does not fit into the destination array, it is simply
> trimmed.
> And respectively if there is no intersection area then nothing happens.
>
> Hope this is clear.
> So to make it work with Numpy arrays one need to calculate the
> slices before copying the data.
> I cannot find any Numpy or Python method to help with that so probably
> it does not exist yet.
> If so, my proposal is to add a Numpy method which helps with that.
> Namely the proposal will be to add a method which returns
> the slices for the intersection areas of two arbitrary arrays, given an
> offset,
> so then one can "blit" the array into another with simple assignment =.
>
> Here is a Python function I use for 2d arrays now:
>
> def interslice ( dest, src, offset ):
> y,x = offset
> H,W = dest.shape
> h,w = src.shape
>
> dest_starty = max (y,0)
> dest_endy = min (y+h,H)
> dest_startx = max (x,0)
> dest_endx = min (x+w,W)
>
> src_starty = 0
> src_endy = h
> if y<0: src_starty = -y
> by = y+h - H # Y bleed
> if by>0: src_endy = h - by
>
> src_startx = 0
> src_endx = w
> if x<0:  src_startx = -x
> bx = x+w - W # X bleed
> if bx>0:  src_endx = w - bx
>
> dest_sliceY =  slice(dest_starty,dest_endy)
> dest_sliceX =  slice(dest_startx,dest_endx)
> src_sliceY = slice(src_starty, src_endy)
> src_sliceX = slice(src_startx, src_endx)
> if dest_endy <= dest_starty:
> print "No Y intersection !"
> dest_sliceY = ( slice(0, 0) )
> src_sliceY = ( slice(0, 0) )
> if dest_endx <= dest_startx:
> print "No X intersection !"
> dest_sliceX = ( slice(0, 0) )
> src_sliceX = ( slice(0, 0) )
> dest_slice = ( dest_sliceY, dest_sliceX )
> src_slice = ( src_sliceY, src_sliceX )
> return ( dest_slice, src_slice )
>
>
> --
>
> I have intentionally made it expanded and without contractions
> so that it is better understandable.
> It returns the intersection area of two arrays given an offset.
> First returned tuple element is the slice for DEST array and the
> second element is the slice for SRC array.
> If there is no intersection along one of the axis at all
> it returns the corresponding slice as (0,0)
>
> With this helper function one can blit arrays easily e.g. example code:
>
> W = 8; H = 8
> DEST = numpy.ones([H,W], dtype = "uint8")
> w = 4; h = 1
> SRC = numpy.zeros([h,w], dtype = "uint8")
> SRC[:]=8
> offset = (0,9)
> ds, ss = interslice (DEST, SRC, offset )
>
> # blit SRC into DEST
> DEST[ds] = SRC[ss]
>
> So changing the offset one can observe how the
> SRC array is trimmed if it is crosses the DEST boundaries.
> I think it is very useful function in general and it has
> well defined behaviour. It has usage not only for graphics,
> but actually any data copying-pasting between arrays.
>
> So I am looking forward to comments on this proposal.
>
>
> Mikhail
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Array blitting (pasting one array into another)

2017-06-30 Thread Joseph Fox-Rabinovitz
If you are serious about adding this to numpy, an even better option might
be to create a pull request with the implementation and solicit comments on
that. The problem lends itself to an easy solution in pure Python, so this
should not be too hard to do.

-Joe


On Fri, Jun 30, 2017 at 4:08 PM, Mikhail V  wrote:

> On 30 June 2017 at 03:34, Joseph Fox-Rabinovitz
>  wrote:
> > This is a useful idea certainly. I would recommended extending it to an
> > arbitrary number of axes. You could either raise an error if the ndim of
> the
> > two arrays are unequal, or allow a broadcast of a lesser ndimmed src
> array.
> >
>
>
> Now I am thinking that there is probably an even better, more generalised
> way to provide this functionality.
> Say if we had a function "intersect" which would be defined as follows:
>
> intersect(A, B, offset)
>
> where A, B are vector endpoints, and the offset is the distance
> between their origins.
> So to find a needed slice I could simply pass the shapes:
>
> intersect (DEST.shape, SRC.shape, offset)
>
> Hmm. there is something to think about. Could be a
> better idea to propose this, since it could be used in many
> other sitiations, not only for finding slice intersection.
>
> Although I'll need some time to think out more examples and use cases.
>
> Mikhail
>
> >
> > On Jun 29, 2017 20:17, "Mikhail V"  wrote:
> >>
> >> Hello all
> >>
> >> I often need to copy one array into another array, given an offset.
> >> This is how the "blit" function can be understood, i.e. in
> >> every graphical lib there is such a function.
> >> The common definition is like:
> >> blit ( dest, src, offset ):
> >> where dest is destination array, src is source array and offset is
> >> coordinates in destination where the src should pe blitted.
> >> Main feature of such function is that it never gives an error,
> >> so if the source does not fit into the destination array, it is simply
> >> trimmed.
> >> And respectively if there is no intersection area then nothing happens.
> >>
> >> Hope this is clear.
> >> So to make it work with Numpy arrays one need to calculate the
> >> slices before copying the data.
> >> I cannot find any Numpy or Python method to help with that so probably
> >> it does not exist yet.
> >> If so, my proposal is to add a Numpy method which helps with that.
> >> Namely the proposal will be to add a method which returns
> >> the slices for the intersection areas of two arbitrary arrays, given an
> >> offset,
> >> so then one can "blit" the array into another with simple assignment =.
> >>
> >> Here is a Python function I use for 2d arrays now:
> >>
> >> def interslice ( dest, src, offset ):
> >> y,x = offset
> >> H,W = dest.shape
> >> h,w = src.shape
> >>
> >> dest_starty = max (y,0)
> >> dest_endy = min (y+h,H)
> >> dest_startx = max (x,0)
> >> dest_endx = min (x+w,W)
> >>
> >> src_starty = 0
> >> src_endy = h
> >> if y<0: src_starty = -y
> >> by = y+h - H # Y bleed
> >> if by>0: src_endy = h - by
> >>
> >> src_startx = 0
> >> src_endx = w
> >> if x<0:  src_startx = -x
> >> bx = x+w - W # X bleed
> >> if bx>0:  src_endx = w - bx
> >>
> >> dest_sliceY =  slice(dest_starty,dest_endy)
> >> dest_sliceX =  slice(dest_startx,dest_endx)
> >> src_sliceY = slice(src_starty, src_endy)
> >> src_sliceX = slice(src_startx, src_endx)
> >> if dest_endy <= dest_starty:
> >> print "No Y intersection !"
> >> dest_sliceY = ( slice(0, 0) )
> >> src_sliceY = ( slice(0, 0) )
> >> if dest_endx <= dest_startx:
> >> print "No X intersection !"
> >> dest_sliceX = ( slice(0, 0) )
> >> src_sliceX = ( slice(0, 0) )
> >> dest_slice = ( dest_sliceY, dest_sliceX )
> >> src_slice = ( src_sliceY, src_sliceX )
> >> return ( dest_slice, src_slice )
> >>
> >>
> >> --
> >>
> >> I have intentionally made it expanded and without contractions
> >> so that it is better understandable.
> >> It returns the intersection area of two arrays given an offset.
> >> First returned tuple element is the slice f

Re: [Numpy-discussion] quantile() or percentile()

2017-07-21 Thread Joseph Fox-Rabinovitz
I think that there would be a very good reason to have a separate function
if we were to introduce weights to the inputs, similarly to the way that we
have mean and average. This would have some (positive) repercussions like
making weighted histograms with the Freedman-Diaconis binwidth estimator a
possibility. I have had this change on the back-burner for a long time,
mainly because I was too lazy to figure out how to include it in the C
code. However, I will take a closer look.

Regards,

-Joe



On Fri, Jul 21, 2017 at 5:11 PM, Chun-Wei Yuan 
wrote:

> There's an ongoing effort to introduce quantile() into numpy.  You'd use
> it just like percentile(), but would input your q value in probability
> space (0.5 for 50%):
>
> https://github.com/numpy/numpy/pull/9213
>
> Since there's a great deal of overlap between these two functions, we'd
> like to solicit opinions on how to move forward on this.
>
> The current thinking is to tolerate the redundancy and keep both, using
> one as the engine for the other.  I'm partial to having quantile because
> 1.) I prefer probability space, and 2.) I have a PR waiting on quantile().
>
> Best,
>
> C
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] quantile() or percentile()

2017-07-21 Thread Joseph Fox-Rabinovitz
While #9211 is a good start, it is pretty inefficient in terms of the fact
that it performs an O(nlogn) sort of the array. It is possible to reduce
the time to O(n) by using a similar partitioning algorithm to the one in
the C code of percentile. I will look into it as soon as I can.

-Joe

On Fri, Jul 21, 2017 at 5:34 PM, Chun-Wei Yuan 
wrote:

> Just to provide some context, 9213 actually spawned off of this guy:
>
> https://github.com/numpy/numpy/pull/9211
>
> which might address the weighted inputs issue Joe brought up.
>
> C
>
> On Fri, Jul 21, 2017 at 2:21 PM, Joseph Fox-Rabinovitz <
> jfoxrabinov...@gmail.com> wrote:
>
>> I think that there would be a very good reason to have a separate
>> function if we were to introduce weights to the inputs, similarly to the
>> way that we have mean and average. This would have some (positive)
>> repercussions like making weighted histograms with the Freedman-Diaconis
>> binwidth estimator a possibility. I have had this change on the back-burner
>> for a long time, mainly because I was too lazy to figure out how to include
>> it in the C code. However, I will take a closer look.
>>
>> Regards,
>>
>> -Joe
>>
>>
>>
>> On Fri, Jul 21, 2017 at 5:11 PM, Chun-Wei Yuan 
>> wrote:
>>
>>> There's an ongoing effort to introduce quantile() into numpy.  You'd use
>>> it just like percentile(), but would input your q value in probability
>>> space (0.5 for 50%):
>>>
>>> https://github.com/numpy/numpy/pull/9213
>>>
>>> Since there's a great deal of overlap between these two functions, we'd
>>> like to solicit opinions on how to move forward on this.
>>>
>>> The current thinking is to tolerate the redundancy and keep both, using
>>> one as the engine for the other.  I'm partial to having quantile because
>>> 1.) I prefer probability space, and 2.) I have a PR waiting on quantile().
>>>
>>> Best,
>>>
>>> C
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ENH: ratio function to mimic diff

2017-07-28 Thread Joseph Fox-Rabinovitz
I have created PR#9481 to introduce a `ratio` function that behaves very
similarly to `diff`, except that it divides successive elements instead of
subtracting them. It has some handling built in for zero division, as well
as the ability to select between `/` and `//` operators.

There is currently no masked version. Perhaps someone could suggest a
simple mechanism for hooking np.ma.true_divide and np.ma.floor_divide in as
the operators instead of the regular np.* versions.

Please let me know your thoughts.

Regards,

-Joe
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ENH: ratio function to mimic diff

2017-07-29 Thread Joseph Fox-Rabinovitz
On Jul 29, 2017 12:23, "Stephan Hoyer"  wrote:

This is an interesting idea, but I don't understand the use cases for this
function. In particular, what would you use n-th order ratios for?


There is no good use case for the nth order differences that I am aware of.
I just added that to mimic the way diff works.

One use case I can think of is estimating the slope of a log-scaled plot.
But here exp(diff(log(x))) is an easy substitute.


My original motivation was very similar to that. I was looking for the
largest geometric gap in a sorted sequence of numbers. Taking logs and
exponents seemed like a sledge hammer for that task.


I guess ratio() would work in cases where values are both positive and
negative, but again I don't know when that would be useful. If your signal
crosses zero, ratios are likely to diverge.


They would, but looking for sign changes is easy, and I added an argument
to flag actual zeros.

On Fri, Jul 28, 2017 at 3:25 PM Joseph Fox-Rabinovitz <
jfoxrabinov...@gmail.com> wrote:

> I have created PR#9481 to introduce a `ratio` function that behaves very
> similarly to `diff`, except that it divides successive elements instead of
> subtracting them. It has some handling built in for zero division, as well
> as the ability to select between `/` and `//` operators.
>
> There is currently no masked version. Perhaps someone could suggest a
> simple mechanism for hooking np.ma.true_divide and np.ma.floor_divide in as
> the operators instead of the regular np.* versions.
>
> Please let me know your thoughts.
>
> Regards,
>
> -Joe
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ENH: ratio function to mimic diff

2017-07-29 Thread Joseph Fox-Rabinovitz
On Jul 29, 2017 12:55, "Nathaniel Smith"  wrote:

I'd also like to see a more detailed motivation for this.

And, if it is useful, then that would make 3 operations that have special
case pairwise moving window variants (subtract, floor_divide, true_divide).
3 is a lot of special cases. Should there instead be a generic mechanism
for doing this for arbitrary binary operations?


Perhaps another method for ufuncs of two arguments? I agree that there
should be a generic mechanism since a lack of one is what is preventing me
from applying this to masked arrays immediately. It would have to take in
some domain filter, like many of the translated masked functions do. A
ufunc could provide that transparently.


-n

On Jul 28, 2017 3:25 PM, "Joseph Fox-Rabinovitz" 
wrote:

> I have created PR#9481 to introduce a `ratio` function that behaves very
> similarly to `diff`, except that it divides successive elements instead of
> subtracting them. It has some handling built in for zero division, as well
> as the ability to select between `/` and `//` operators.
>
> There is currently no masked version. Perhaps someone could suggest a
> simple mechanism for hooking np.ma.true_divide and np.ma.floor_divide in as
> the operators instead of the regular np.* versions.
>
> Please let me know your thoughts.
>
> Regards,
>
> -Joe
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] quantile() or percentile()

2017-08-03 Thread Joseph Fox-Rabinovitz
Not that I know of. The algorithm is very simple, requiring a
relatively small addition to the current introselect algorithm used
for `np.partition`. My biggest hurdle is figuring out how the calling
machinery really works so that I can figure out which input type
permutations I need to generate, and how to get the right backend
running for a given function call.

-Joe

On Thu, Aug 3, 2017 at 1:00 PM, Chun-Wei Yuan  wrote:
> Any way I can help expedite this?
>
> On Fri, Jul 21, 2017 at 4:42 PM, Chun-Wei Yuan 
> wrote:
>>
>> That would be great.  I just used np.argsort because it was familiar to
>> me.  Didn't know about the C code.
>>
>> On Fri, Jul 21, 2017 at 3:43 PM, Joseph Fox-Rabinovitz
>>  wrote:
>>>
>>> While #9211 is a good start, it is pretty inefficient in terms of the
>>> fact that it performs an O(nlogn) sort of the array. It is possible to
>>> reduce the time to O(n) by using a similar partitioning algorithm to the one
>>> in the C code of percentile. I will look into it as soon as I can.
>>>
>>> -Joe
>>>
>>> On Fri, Jul 21, 2017 at 5:34 PM, Chun-Wei Yuan 
>>> wrote:
>>>>
>>>> Just to provide some context, 9213 actually spawned off of this guy:
>>>>
>>>> https://github.com/numpy/numpy/pull/9211
>>>>
>>>> which might address the weighted inputs issue Joe brought up.
>>>>
>>>> C
>>>>
>>>> On Fri, Jul 21, 2017 at 2:21 PM, Joseph Fox-Rabinovitz
>>>>  wrote:
>>>>>
>>>>> I think that there would be a very good reason to have a separate
>>>>> function if we were to introduce weights to the inputs, similarly to the 
>>>>> way
>>>>> that we have mean and average. This would have some (positive) 
>>>>> repercussions
>>>>> like making weighted histograms with the Freedman-Diaconis binwidth
>>>>> estimator a possibility. I have had this change on the back-burner for a
>>>>> long time, mainly because I was too lazy to figure out how to include it 
>>>>> in
>>>>> the C code. However, I will take a closer look.
>>>>>
>>>>> Regards,
>>>>>
>>>>> -Joe
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jul 21, 2017 at 5:11 PM, Chun-Wei Yuan 
>>>>> wrote:
>>>>>>
>>>>>> There's an ongoing effort to introduce quantile() into numpy.  You'd
>>>>>> use it just like percentile(), but would input your q value in 
>>>>>> probability
>>>>>> space (0.5 for 50%):
>>>>>>
>>>>>> https://github.com/numpy/numpy/pull/9213
>>>>>>
>>>>>> Since there's a great deal of overlap between these two functions,
>>>>>> we'd like to solicit opinions on how to move forward on this.
>>>>>>
>>>>>> The current thinking is to tolerate the redundancy and keep both,
>>>>>> using one as the engine for the other.  I'm partial to having quantile
>>>>>> because 1.) I prefer probability space, and 2.) I have a PR waiting on
>>>>>> quantile().
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> C
>>>>>>
>>>>>> ___
>>>>>> NumPy-Discussion mailing list
>>>>>> NumPy-Discussion@python.org
>>>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>>>
>>>>>
>>>>>
>>>>> ___
>>>>> NumPy-Discussion mailing list
>>>>> NumPy-Discussion@python.org
>>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>>
>>>>
>>>>
>>>> ___
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion@python.org
>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>
>>>
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] quantile() or percentile()

2017-08-04 Thread Joseph Fox-Rabinovitz
I will go over your PR carefully to make sure we can agree on a
matching API. After that, we can swap the backend out whenever I get
around to it.

Thanks for working on this.

-Joe

On Thu, Aug 3, 2017 at 5:36 PM, Chun-Wei Yuan  wrote:
> Cool.  Just as a heads up, for my algorithm to work, I actually need the
> indices, which is why argsort() is so important to me.  I use it to get both
> ap_sorted and ws_sorted variables.  If your weighted-quantile algo is faster
> and doesn't require those indices, please by all means change my
> implementation.  Thanks.
>
> On Thu, Aug 3, 2017 at 11:10 AM, Joseph Fox-Rabinovitz
>  wrote:
>>
>> Not that I know of. The algorithm is very simple, requiring a
>> relatively small addition to the current introselect algorithm used
>> for `np.partition`. My biggest hurdle is figuring out how the calling
>> machinery really works so that I can figure out which input type
>> permutations I need to generate, and how to get the right backend
>> running for a given function call.
>>
>> -Joe
>>
>> On Thu, Aug 3, 2017 at 1:00 PM, Chun-Wei Yuan 
>> wrote:
>> > Any way I can help expedite this?
>> >
>> > On Fri, Jul 21, 2017 at 4:42 PM, Chun-Wei Yuan 
>> > wrote:
>> >>
>> >> That would be great.  I just used np.argsort because it was familiar to
>> >> me.  Didn't know about the C code.
>> >>
>> >> On Fri, Jul 21, 2017 at 3:43 PM, Joseph Fox-Rabinovitz
>> >>  wrote:
>> >>>
>> >>> While #9211 is a good start, it is pretty inefficient in terms of the
>> >>> fact that it performs an O(nlogn) sort of the array. It is possible to
>> >>> reduce the time to O(n) by using a similar partitioning algorithm to
>> >>> the one
>> >>> in the C code of percentile. I will look into it as soon as I can.
>> >>>
>> >>> -Joe
>> >>>
>> >>> On Fri, Jul 21, 2017 at 5:34 PM, Chun-Wei Yuan
>> >>> 
>> >>> wrote:
>> >>>>
>> >>>> Just to provide some context, 9213 actually spawned off of this guy:
>> >>>>
>> >>>> https://github.com/numpy/numpy/pull/9211
>> >>>>
>> >>>> which might address the weighted inputs issue Joe brought up.
>> >>>>
>> >>>> C
>> >>>>
>> >>>> On Fri, Jul 21, 2017 at 2:21 PM, Joseph Fox-Rabinovitz
>> >>>>  wrote:
>> >>>>>
>> >>>>> I think that there would be a very good reason to have a separate
>> >>>>> function if we were to introduce weights to the inputs, similarly to
>> >>>>> the way
>> >>>>> that we have mean and average. This would have some (positive)
>> >>>>> repercussions
>> >>>>> like making weighted histograms with the Freedman-Diaconis binwidth
>> >>>>> estimator a possibility. I have had this change on the back-burner
>> >>>>> for a
>> >>>>> long time, mainly because I was too lazy to figure out how to
>> >>>>> include it in
>> >>>>> the C code. However, I will take a closer look.
>> >>>>>
>> >>>>> Regards,
>> >>>>>
>> >>>>> -Joe
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On Fri, Jul 21, 2017 at 5:11 PM, Chun-Wei Yuan
>> >>>>> 
>> >>>>> wrote:
>> >>>>>>
>> >>>>>> There's an ongoing effort to introduce quantile() into numpy.
>> >>>>>> You'd
>> >>>>>> use it just like percentile(), but would input your q value in
>> >>>>>> probability
>> >>>>>> space (0.5 for 50%):
>> >>>>>>
>> >>>>>> https://github.com/numpy/numpy/pull/9213
>> >>>>>>
>> >>>>>> Since there's a great deal of overlap between these two functions,
>> >>>>>> we'd like to solicit opinions on how to move forward on this.
>> >>>>>>
>> >>>>>> The current thinking is to tolerate the redundancy and keep both,
>> >>>>>> using one as the engine for the other.  I'm partial to having
>> >>>>>

[Numpy-discussion] ENH: Proposal to add np.neighborwise in PR#9514

2017-08-04 Thread Joseph Fox-Rabinovitz
I would like to propose the addition of a new function,
`np.neighborwise` in PR#9514. It is based on the discussion relating
to my proposal for `np.ratio` (PR#9481) and Eric Wieser's
`np.neighborwise` in PR#9428. This function accepts an array `a`, a
vectorized function of two arguments `func`, and applies the function
to all of the neighboring elements of the array across multiple
dimensions. There are options for masking out parts of the calculation
and for applying the function recursively.

The name of the function is not written in stone. The current name is
taken directly from PR#9428 because I can not think of a better one.

This function can serve as a backend for the existing `np.diff`, which
has been re-implemented in this PR, as well as for the `ratio`
function I propsed earlier. This adds the diagonal diffs feature,
which is tested and backwards compatible. `ratio` can be implemented
very simply with or without a mask. With a mask, it can be expressed
`np.neighborwise(a, np.*_divide, axis=axis, n=n, mask=lambda *args:
args[1])` (The conversion to bool is done automatically).

The one potentially non-backwards-compatible API change that this PR
introduces is that `np.diff` now returns an `ndarray` version of the
input, instead of the original array itself if `n==0`. Previously, the
exact input reference was returned for `n==0`. I very seriously doubt
that this feature was ever used outside the numpy test suite anyway.
The advantage of this change is that an invalid axis input can now be
caught before returning the unaltered array. If this change is
considered too drastic, I can remove it without removing the axis
check.

The two main differences between this PR and PR#9428 are the addition
of masks to the computation, and the interpretation of multiple axes.
PR#9428 applies `func` successively along each axis. This provides no
way of doing diagonal diffs. I chose to shift along all the axes
simultaneously before applying `func`. To clarify with an example, if
we take `a=[[1, 2], [3, 4]]`, `axis=[0, 1]` and `func=np.subtract`,
PR#9428 would take two diffs, `(4 - 2) - (3 - 1) = 0`, while the
version I propose here just takes the diagonal diff `4 - 1 = 3`.
Besides being more intuitive in my opinion, taking diagonal diffs
actually adds a new feature that can not be obtained directly by
taking successive diffs.

Please let me know your thoughts.

Regards,

-Joe
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Sustainability

2017-10-04 Thread Joseph Fox-Rabinovitz
Could you elaborate on the purpose of the meeting, or perhaps point to
a link with a description if there is one? Sustainability is a very
broad topic. What do you plan on discussing?

-Joe

On Tue, Oct 3, 2017 at 7:04 PM, Charles R Harris
 wrote:
> Hi All,
>
> I and a number of others representing various open source projects under the
> NumFocus umbrella will be attending as meeting next Tuesday do discuss the
> problem of sustainability. In preparation for that meeting I would be
> interested in any ideas that the folks who follow this list may have on the
> subject.
>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Sorting of an array row-by-row?

2017-10-20 Thread Joseph Fox-Rabinovitz
There are two mistakes in your PS. The immediate error comes from the
fact that lexsort accepts an iterable of 1D arrays, so when you pass
in arr as the argument, it is treated as an iterable over the rows,
each of which is 1D. 1D arrays do not have an axis=1. You actually
want to iterate over the columns, so np.lexsort(a.T) is the correct
phrasing of that. No idea about the speed difference.

   -Joe

On Fri, Oct 20, 2017 at 6:00 AM, Kirill Balunov  wrote:
> Hi,
>
> I was trying to sort an array (N, 3) by rows, and firstly come with this
> solution:
>
> N = 100
> arr = np.random.randint(-100, 100, size=(N, 3))
> dt = np.dtype([('x', int),('y', int),('z', int)])
>
> arr.view(dtype=dt).sort(axis=0)
>
> Then I found another way using lexsort function:
>
> idx = np.lexsort([arr[:, 2], arr[:, 1], arr[:, 0]])
> arr = arr[idx]
>
> Which is 4 times faster than the previous solution. And now i have several
> questions:
>
> Why is the first way so much slower?
> What is the fastest way in numpy to sort array by rows?
> Why is the order of keys in lexsort function reversed?
>
> The last question  was really the root of the problem for me with the
> lexsort function.
> And I still can not understand the idea of such an order (the last is the
> primary), it seems to me confusing.
>
> Thank you!!! With kind regards, Kirill.
>
> p.s.: One more thing, when i first try to use lexsort. I catch this strange
> exception:
>
> np.lexsort(arr, axis=1)
>
> ---
> AxisError Traceback (most recent call last)
>  in ()
> > 1 np.lexsort(ls, axis=1)
>
> AxisError: axis 1 is out of bounds for array of dimension 1
>
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Sorting of an array row-by-row?

2017-10-20 Thread Joseph Fox-Rabinovitz
I do not think that there is any particular relationship between the
order of the keys and lexicographic order. The key order is just a
convention, which is clearly documented. I agree that it is a bit
counter-intuitive for anyone that has used excel or MATLAB, but it is
ingrained in the API at this point.

-Joe

On Fri, Oct 20, 2017 at 3:03 PM, Kirill Balunov  wrote:
> Thank you Josef, you gave me an idea, and now the fastest version (for big
> arrays) on my laptop is:
>
> np.lexsort(arr[:, ::-1].T)
>
> For me the most strange thing is the order of keys, what was an idea to keep
> then right-to-left? How does this relate to lexicographic order?
>
> 2017-10-20 17:11 GMT+03:00 Joseph Fox-Rabinovitz :
>>
>> There are two mistakes in your PS. The immediate error comes from the
>> fact that lexsort accepts an iterable of 1D arrays, so when you pass
>> in arr as the argument, it is treated as an iterable over the rows,
>> each of which is 1D. 1D arrays do not have an axis=1. You actually
>> want to iterate over the columns, so np.lexsort(a.T) is the correct
>> phrasing of that. No idea about the speed difference.
>>
>>-Joe
>>
>> On Fri, Oct 20, 2017 at 6:00 AM, Kirill Balunov 
>> wrote:
>> > Hi,
>> >
>> > I was trying to sort an array (N, 3) by rows, and firstly come with this
>> > solution:
>> >
>> > N = 100
>> > arr = np.random.randint(-100, 100, size=(N, 3))
>> > dt = np.dtype([('x', int),('y', int),('z', int)])
>> >
>> > arr.view(dtype=dt).sort(axis=0)
>> >
>> > Then I found another way using lexsort function:
>> >
>> > idx = np.lexsort([arr[:, 2], arr[:, 1], arr[:, 0]])
>> > arr = arr[idx]
>> >
>> > Which is 4 times faster than the previous solution. And now i have
>> > several
>> > questions:
>> >
>> > Why is the first way so much slower?
>> > What is the fastest way in numpy to sort array by rows?
>> > Why is the order of keys in lexsort function reversed?
>> >
>> > The last question  was really the root of the problem for me with the
>> > lexsort function.
>> > And I still can not understand the idea of such an order (the last is
>> > the
>> > primary), it seems to me confusing.
>> >
>> > Thank you!!! With kind regards, Kirill.
>> >
>> > p.s.: One more thing, when i first try to use lexsort. I catch this
>> > strange
>> > exception:
>> >
>> > np.lexsort(arr, axis=1)
>> >
>> >
>> > ---
>> > AxisError Traceback (most recent call
>> > last)
>> >  in ()
>> > > 1 np.lexsort(ls, axis=1)
>> >
>> > AxisError: axis 1 is out of bounds for array of dimension 1
>> >
>> >
>> >
>> >
>> > ___
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion@python.org
>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>> >
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] best way of speeding up a filtering-like algorithm

2018-03-28 Thread Joseph Fox-Rabinovitz
It looks like you are creating a coastline mask (or a coastline mask +
some other mask), and computing the ratio of two quantities in a
particular window around each point. If your coastline covers a
sufficiently large portion of the image, you may get quite a bit of
mileage using an efficient convolution instead of summing the windows
directly. For example, you could use scipy.signal.convolve2d with
inputs being (nsidc_copy != NSIDC_COASTLINE_MIXED), (nsidc_copy ==
NSIDC_SEAICE_LOW & nsdic_copy == NSIDC_FRESHSNOW) for the frst array,
and a (2*radius x 2*radius) array of ones for the second. You may have
to center the block of ones in an array of zeros the same size as
nsdic_copy, but I am not sure about that.

Another option you may want to try is implementing your window
movement more efficiently. If you step your window center along using
an algorithm like flood-fill, you can insure that there will be very
large overlap between successive steps (even if there is a break in
the coastline). That means that you can reuse most of the data you've
extracted. You will only need to subtract off the non-overlapping
portion of the previous window and add in the non-overlapping portion
of the updated window. If radius is 16, giving you a 32x32 window, you
go from summing ~1000 pixels per quantity of interest, to summing only
~120 if the window moves along a diagonal, and only 64 if it moves
vertically or horizontally. While an algorithm like this will probably
give you the greatest boost, it is a pain to implement.

If I had to guess, this looks like L2 processing for a multi-spectral
instrument. If you don't mind me asking, what mission is this for? I'm
working on space-looking detectors at the moment, but have spent many
years on the L0, L1b and L1 portions of the GOES-R ground system.

- Joe

On Wed, Mar 28, 2018 at 9:43 PM, Eric Wieser
 wrote:
> Well, one tip to start with:
>
> numpy.where(some_comparison, True, False)
>
> is the same as but slower than
>
> some_comparison
>
> Eric
>
> On Wed, 28 Mar 2018 at 18:36 Moroney, Catherine M (398E)
>  wrote:
>>
>> Hello,
>>
>>
>>
>> I have the following sample code (pretty simple algorithm that uses a
>> rolling filter window) and am wondering what the best way is of speeding it
>> up.  I tried rewriting it in Cython by pre-declaring the variables but that
>> didn’t buy me a lot of time.  Then I rewrote it in Fortran (and compiled it
>> with f2py) and now it’s lightning fast.  But I would still like to know if I
>> could rewrite it in pure python/numpy/scipy or in Cython and get a similar
>> speedup.
>>
>>
>>
>> Here is the raw Python code:
>>
>>
>>
>> def mixed_coastline_slow(nsidc, radius, count, mask=None):
>>
>>
>>
>> nsidc_copy = numpy.copy(nsidc)
>>
>>
>>
>> if (mask is None):
>>
>> idx_coastline = numpy.where(nsidc_copy == NSIDC_COASTLINE_MIXED)
>>
>> else:
>>
>> idx_coastline = numpy.where(mask & (nsidc_copy ==
>> NSIDC_COASTLINE_MIXED))
>>
>>
>>
>> for (irow0, icol0) in zip(idx_coastline[0], idx_coastline[1]):
>>
>>
>>
>> rows = ( max(irow0-radius, 0), min(irow0+radius+1,
>> nsidc_copy.shape[0]) )
>>
>> cols = ( max(icol0-radius, 0), min(icol0+radius+1,
>> nsidc_copy.shape[1]) )
>>
>> window = nsidc[rows[0]:rows[1], cols[0]:cols[1]]
>>
>>
>>
>> npoints = numpy.where(window != NSIDC_COASTLINE_MIXED, True,
>> False).sum()
>>
>> nsnowice = numpy.where( (window >= NSIDC_SEAICE_LOW) & (window <=
>> NSIDC_FRESHSNOW), \
>>
>> True, False).sum()
>>
>>
>>
>> if (100.0*nsnowice/npoints >= count):
>>
>>  nsidc_copy[irow0, icol0] = MISR_SEAICE_THRESHOLD
>>
>>
>>
>> return nsidc_copy
>>
>>
>>
>> and here is my attempt at Cython-izing it:
>>
>>
>>
>> import numpy
>>
>> cimport numpy as cnumpy
>>
>> cimport cython
>>
>>
>>
>> cdef int NSIDC_SIZE  = 721
>>
>> cdef int NSIDC_NO_SNOW = 0
>>
>> cdef int NSIDC_ALL_SNOW = 100
>>
>> cdef int NSIDC_FRESHSNOW = 103
>>
>> cdef int NSIDC_PERMSNOW  = 101
>>
>> cdef int NSIDC_SEAICE_LOW  = 1
>>
>> cdef int NSIDC_SEAICE_HIGH = 100
>>
>> cdef int NSIDC_COASTLINE_MIXED = 252
>>
>> cdef int NSIDC_SUSPECT_ICE = 253
>>
>>
>>
>> cdef int MISR_SEAICE_THRESHOLD = 6
>>
>>
>>
>> def mixed_coastline(cnumpy.ndarray[cnumpy.uint8_t, ndim=2] nsidc, int
>> radius, int count):
>>
>>
>>
>>  cdef int irow, icol, irow1, irow2, icol1, icol2, npoints, nsnowice
>>
>>  cdef cnumpy.ndarray[cnumpy.uint8_t, ndim=2] nsidc2 \
>>
>> = numpy.empty(shape=(NSIDC_SIZE, NSIDC_SIZE), dtype=numpy.uint8)
>>
>>  cdef cnumpy.ndarray[cnumpy.uint8_t, ndim=2] window \
>>
>> = numpy.empty(shape=(2*radius+1, 2*radius+1), dtype=numpy.uint8)
>>
>>
>>
>>  nsidc2 = numpy.copy(nsidc)
>>
>>
>>
>>  idx_coastline = numpy.where(nsidc2 == NSIDC_COASTLINE_MIXED)
>>
>>
>>
>>  for (irow, icol) in zip(idx_coastline[0], idx_coastline[1]):
>>
>>
>>
>>   irow1 = max(irow-radius, 0)
>>
>>   irow2 = min(irow+radius+1, 

[Numpy-discussion] PR adding support for object arrays to np.isinf, np.isnan, np.isfinite

2018-03-28 Thread Joseph Fox-Rabinovitz
I have opened PR #10820 to add support for `dtype=object` to
`np.isinf`, `np.isnan`, `np.isfinite`. The PR is a fairly minor
change, but I would like to make sure that I understand at least the
basics of ufuncs before I start adding support for datetimes and
timedeltas to `np.isfinite` and eventually to `np.histogram`. I have
left a few comments in areas I am not sure about, and would greatly
appreciate feedback, even if the PR is not found suitable for merging.

With this PR, object arrays containing any numerical or simulated
numerical types (implementing `__float__` or `__complex__` methods)
are processed as would be expected. While working on PR, I came up
with two questions for the gurus:

1. Am I correct in understanding that `isinf`, `isnan` and `isfinite`
currently cast integer inputs to float to process them? Why are
integer inputs not optimized to return arrays of all False, False,
True, respectively for those functions?

2. Why are `isneginf` and `isposinf` not ufuncs? Is there any reason
not to make them ufuncs (besides the renaming of the `y` parameter to
`out`, which technically breaks some backward compatibility)?

Regards,

- Joe
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Possible bug in np.array type calculation

2018-04-03 Thread Joseph Fox-Rabinovitz
I recently asked a question on Stack Overflow about whether `np.array`
could raise an error if not passed a dtype parameter:
https://stackoverflow.com/q/49639414/2988730.

Turns out it can:

np.array([1, [2]])

raises `ValueError: setting an array element with a sequence.` Surprisingly
though, the following does not, and gives the expected array with
`dtype=object`:

np.array([[1], 2])

Is this behavior a bug of sorts, or is there some arcane reason behind it?

Regards,

- Joe
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ENH: Adding a count parameter to np.unpackbits

2018-04-07 Thread Joseph Fox-Rabinovitz
Hi,

I have added PR #10855 to allow unpackbits to unpack less than the entire
set of bits. This is not a very big change, and 100% backwards compatible.
It serves two purposes:

1. To make packbits and unpackbits completely invertible (and
prevent things like this from being necessary:
https://stackoverflow.com/a/44962805/2988730)
2. To prevent an unnecessary waste of space for large arrays that are
unpacked along a single dimension.

Regards,

- Joe
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Adding a return value to np.random.shuffle

2018-04-12 Thread Joseph Fox-Rabinovitz
Would it break backwards compatibility to add the input as a return value
to np.random.shuffle? I doubt anyone out there is relying on the None
return value.

The change is trivial, and allows shuffling a new array in one line instead
of two:

x = np.random.shuffle(np.array(some_junk))

I've implemented the change in PR#10893.

Regards,

- Joe
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adding a return value to np.random.shuffle

2018-04-12 Thread Joseph Fox-Rabinovitz
Sounds good. I will close the PR.

- Joe

On Thu, Apr 12, 2018 at 1:54 PM, Sebastian Berg 
wrote:

> On Thu, 2018-04-12 at 13:36 -0400, Joseph Fox-Rabinovitz wrote:
> > Would it break backwards compatibility to add the input as a return
> > value to np.random.shuffle? I doubt anyone out there is relying on
> > the None return value.
> >
>
> Well, python discourages this IIRC, and opts to not do these things for
> in place functions (see random package specifically). Numpy breaks this
> in a few places, but that is mostly because we have the out argument as
> an optional input argument.
>
> As is, it is a nice way of making people not write:
>
> new = np.random.shuffle(old)
>
> and think old won't change. So I think we should probably just stick
> with the python/Guido van Rossum ideals, or did those change?
>
> - Sebastian
>
>
>
> > The change is trivial, and allows shuffling a new array in one line
> > instead of two:
> >
> > x = np.random.shuffle(np.array(some_junk))
> >
> > I've implemented the change in PR#10893.
> >
> > Regards,
> >
> > - Joe
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adding a return value to np.random.shuffle

2018-04-12 Thread Joseph Fox-Rabinovitz
Agreed. I closed the PR.

- Joe

On Thu, Apr 12, 2018 at 4:24 PM, Alan Isaac  wrote:

> Some people consider that not to be Pythonic:
> https://mail.python.org/pipermail/python-dev/2003-October/038855.html
>
> Alan Isaac
>
> On 4/12/2018 1:36 PM, Joseph Fox-Rabinovitz wrote:
>
>> Would it break backwards compatibility to add the input as a return value
>> to np.random.shuffle? I doubt anyone out there is relying on the None
>> return value.
>>
>> The change is trivial, and allows shuffling a new array in one line
>> instead of two:
>>
>>  x = np.random.shuffle(np.array(some_junk))
>>
>> I've implemented the change in PR#10893.
>>
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Short-circuiting equivalent of np.any or np.all?

2018-04-26 Thread Joseph Fox-Rabinovitz
Would it be useful to have a short-circuited version of the function that
is not a ufunc?

- Joe

On Thu, Apr 26, 2018 at 12:51 PM, Hameer Abbasi 
wrote:

> Hi Nathan,
>
> np.any and np.all call np.or.reduce and np.and.reduce respectively, and
> unfortunately the underlying function (ufunc.reduce) has no way of
> detecting that the value isn’t going to change anymore. It’s also used for
> (for example) np.sum (np.add.reduce), np.prod (np.multiply.reduce),
> np.min(np.minimum.reduce), np.max(np.maximum.reduce).
>
> You can find more information about this on the ufunc doc page
> . I don’t think
> it’s worth it to break this machinery for any and all, as it has numerous
> other advantages (such as being able to override in duck arrays, etc)
>
> Best regards,
> Hameer Abbasi
> Sent from Astro  for Mac
>
> On Apr 26, 2018 at 18:45, Nathan Goldbaum  wrote:
>
>
> Hi all,
>
> I was surprised recently to discover that both np.any and np.all() do not
> have a way to exit early:
>
> In [1]: import numpy as np
>
> In [2]: data = np.arange(1e6)
>
> In [3]: print(data[:10])
> [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
>
> In [4]: %timeit np.any(data)
> 724 us +- 42.4 us per loop (mean +- std. dev. of 7 runs, 1000 loops each)
>
> In [5]: data = np.zeros(int(1e6))
>
> In [6]: %timeit np.any(data)
> 732 us +- 52.9 us per loop (mean +- std. dev. of 7 runs, 1000 loops each)
>
> I don't see any discussions about this on the NumPy issue tracker but
> perhaps I'm missing something.
>
> I'm curious if there's a way to get a fast early-terminating search in
> NumPy? Perhaps there's another package I can depend on that does this? I
> guess I could also write a bit of cython code that does this but so far
> this project is pure python and I don't want to deal with the packaging
> headache of getting wheels built and conda-forge packages set up on all
> platforms.
>
> Thanks for your help!
>
> -Nathan
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] PR Cleanup

2018-09-25 Thread Joseph Fox-Rabinovitz
I think that PRs 7804 and 10855 can be merged (or closed; closure either
way).

7804: ENH: Added atleast_nd
This has been ready to go for a while. It was deemed superfluous at one
point, but then experienced a revival, which ended up stagnating.
10855: ENH: Adding a count parameter to np.unpackbits
Has been ready for a while now as well. There is a possible
modification to the interface that may need to be made (not allowing
negative indices), but it's passing everything and good to go as-is, in my
opinion.

Regards,

- Joe


On Tue, Sep 25, 2018 at 11:52 AM Charles R Harris 
wrote:

> Hi All,
>
> As usual, the top of the PR stack is getting all the attention. As a start
> on cleaning up the old PRs, I'd like to suggest that all the maintainers
> look at their (long) pending PRs and decide which they want to keep, close
> those they don't want to pursue, and rebase the others. Might also help if
> they would post here the PRs that they think we should finish up.
>
> Chuck
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Depreciating asfortranarray and ascontiguousarray

2018-10-25 Thread Joseph Fox-Rabinovitz
In that vein, would it be advisable to re-implement them as aliases for the
correctly behaving functions instead?

- Joe

On Thu, Oct 25, 2018 at 5:01 PM Joe Kington  wrote:

> For what it's worth, these are fairly widely used functions.  From a user
> standpoint, I'd gently argue against deprecating them. Documenting the
> inconsistency with scalars  seems like a less invasive approach.
>
> In particular ascontiguousarray is a very common check to make when
> working with C libraries or low-level file formats.  A significant
> advantage over asarray(..., order='C') is readability.  It makes the
> intention very clear.  Similarly, asfortranarray is quite readable for
> folks that aren't deeply familiar with numpy.
>
> Given that the use-cases they're primarily used for are likely to be read
> by developers working in other languages (i.e. ascontiguousarray gets used
> at a lot of "boundaries" with other systems), keeping function names that
> make intention very clear is important.
>
> Just my $0.02, anyway.  Cheers,
> -Joe
>
> On Thu, Oct 25, 2018 at 3:17 PM Alex Rogozhnikov <
> alex.rogozhni...@yandex.ru> wrote:
>
>> Dear numpy community,
>>
>> I'm planning to depreciate np.asfortranarray and np.ascontiguousarray
>> functions due to their misbehavior on scalar (0-D tensors) with PR #12244
>> .
>>
>> Current behavior (converting scalars to 1-d array with single element)
>> - is unexpected and contradicts to documentation
>> - probably, can't be changed without breaking external code
>> - I believe, this was a cause for poor support of 0-d arrays in mxnet.
>> - both functions are easily replaced with asarray(..., order='...'),
>> which has expected behavior
>>
>> There is no timeline for removal - we just need to discourage from using
>> this functions in new code.
>>
>> Function naming may be related to how numpy treats 0-d tensors specially,
>>
>> and those probably should not be called arrays.
>> https://www.numpy.org/neps/nep-0027-zero-rank-arrarys.html
>> However, as a user I never thought about 0-d arrays being special and
>> being "not arrays".
>>
>>
>> Please see original discussion at github for more details
>> https://github.com/numpy/numpy/issues/5300
>>
>> Your comments welcome,
>> Alex Rogozhnikov
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] DOC: Updates to nditer usage instructions PR#12828

2019-01-22 Thread Joseph Fox-Rabinovitz
Hi,

I have just added PR #12828, based off issue #12764 to clarify some of the
documentation of `nditer`. While it contains most of the necessary
material, it's still a bit of a rough draft, and I'd be happy to have some
comments/advice on it.

In particular, I didn't know how much to emphasize the fact that context
management only became a thing for `nditer` in version 1.15.0. I'm also not
sure about how much need there is to reiterate the C-style iteration
methods vs the normal Python-style iteration.

Regards,

- Joe
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ENH: Added option to suppress stout/err capture in tests PR #12829

2019-01-22 Thread Joseph Fox-Rabinovitz
Hi,

I recently had some issues setting up gdb to fix some self-inflicted
segfaults, so instead I ended up adding an option to suppress stdout/stderr
capture by pytest in PR#12829. I think this is a useful feature to have in
general, so I made this PR. The only problem with it is that there are no
formal tests for it (I did verify that all the possible options work
manually though).

Regards,

- Joe
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] PR 14966: Adding a new argument to np.asfarray

2019-11-22 Thread Joseph Fox-Rabinovitz
Hi,

I've submitted PR #14966, which makes a couple of small, backward
compatible, changes to the API of `asfarray`:

1. Added `copy` parameter that defaults to `False`
2. Added `None` option to the `dtype` parameter

Item #1 is inspired by situations like the one in Stack Overflow question
https://stackoverflow.com/q/58998475/2988730. Sometimes, you just need to
ensure a copy, and it's nice not to have to check things like if
`asfarray(x) is x: x = x.copy()`.

Item #2 solves the problem of trying to do `asfarray(x, dtype=x.dtype)` for
`x` that don't have a `dtype` attribute, like lists or tuples. I've made
every effort to make `dtype` and `copy` play together nicely.

On an unrelated note, I've also submitted #14967 to clean up the internals
of `mintypecode` a little in the same file.

Regards,

- Joe
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] introducing autoreg and autoregnn

2020-06-05 Thread Joseph Fox-Rabinovitz
Rondall,

Are you familiar with the lmfit project? I am not an expert, but it seems
like your algorithms may be useful there. I recommend checking with Matt
Newville via the mailing list.

Regards,

Joe


On Fri, Jun 5, 2020, 17:00 Ralf Gommers  wrote:

>
>
> On Fri, Jun 5, 2020 at 9:48 PM rondall jones  wrote:
>
>> Hello! I have supported constrained solvers for linear matrix problems
>> for about 10 years in C++, but have now switched to Python. I am going to
>> submit a couple of new routines for linalg called autoreg(A,b) and
>> autoregnn(A,b). They work just like lstsq(A,b) normally, but when they
>> detect that the problem is dominated by noise they revert to an automatic
>> regularization scheme that returns a better behaved result than one gets
>> from lstsq. In addition, autoregnn enforces a nonnegativity constraint on
>> the solution. I have put on my web site a slightly fuller featured version
>> of these same two algorithms, using a Class implementation to facilitate
>> retuning several diagnostic or other artifacts. The web site contains
>> tutorials on these methods and a number of examples of their use. See
>> http://www.rejones7.net/autorej/ . I hope this community can take a look
>> at these routines and see whether they are appropriate for linalg or should
>> be in another location.
>>
>
> Hi Ron, thanks for proposing this. It seems out of scope for NumPy;
> scipy.linalg or scipy.optimize seem like the most obvious candidates.
>
> If you propose inclusion into SciPy, it would be good to discuss whether
> the algorithm is based on a publication showing usage via citation stats or
> some other way. There's more details at
> http://scipy.github.io/devdocs/dev/core-dev/index.html#deciding-on-new-features
>
> Cheers,
> Ralf
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Optimized np.digitize for equidistant bins

2020-12-18 Thread Joseph Fox-Rabinovitz
Bin index is just value floor divided by the bin size.

On Fri, Dec 18, 2020, 09:59 Martín Chalela  wrote:

> Hi all! I was wondering if there is a way around to using np.digitize when
> dealing with equidistant bins. For example:
> bins = np.linspace(0, 1, 20)
>
> The main problem I encountered is that digitize calls np.searchsorted.
> This is the correct way, I think, for generic bins, i.e. bins that have
> different widths. However, in the special, but not uncommon, case of
> equidistant bins, the searchsorted call can be very expensive and
> unnecessary. One can perform a simple calculation like the following:
>
> def digitize_eqbins(x, bins):
> """
> Return the indices of the bins to which each value in input array belongs.
> Assumes equidistant bins.
> """
> nbins = len(bins) - 1
> digit = (nbins * (x - bins[0]) / (bins[-1] - bins[0])).astype(np.int)
> return digit + 1
>
> Is there a better way of computing this for equidistant bins?
>
> Thank you!
> Martin.
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Optimized np.digitize for equidistant bins

2020-12-18 Thread Joseph Fox-Rabinovitz
There is: np.floor_divide.

On Fri, Dec 18, 2020, 14:38 Martín Chalela  wrote:

> Right! I just thought there would/should be a "digitize" function that did
> this.
>
> El vie, 18 dic 2020 a las 14:16, Joseph Fox-Rabinovitz (<
> jfoxrabinov...@gmail.com>) escribió:
>
>> Bin index is just value floor divided by the bin size.
>>
>> On Fri, Dec 18, 2020, 09:59 Martín Chalela 
>> wrote:
>>
>>> Hi all! I was wondering if there is a way around to using np.digitize
>>> when dealing with equidistant bins. For example:
>>> bins = np.linspace(0, 1, 20)
>>>
>>> The main problem I encountered is that digitize calls np.searchsorted.
>>> This is the correct way, I think, for generic bins, i.e. bins that have
>>> different widths. However, in the special, but not uncommon, case of
>>> equidistant bins, the searchsorted call can be very expensive and
>>> unnecessary. One can perform a simple calculation like the following:
>>>
>>> def digitize_eqbins(x, bins):
>>> """
>>> Return the indices of the bins to which each value in input array belongs
>>> .
>>> Assumes equidistant bins.
>>> """
>>> nbins = len(bins) - 1
>>> digit = (nbins * (x - bins[0]) / (bins[-1] - bins[0])).astype(np.int)
>>> return digit + 1
>>>
>>> Is there a better way of computing this for equidistant bins?
>>>
>>> Thank you!
>>> Martin.
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to speed up array generating

2021-01-09 Thread Joseph Fox-Rabinovitz
What other ways have you tried?

On Sat, Jan 9, 2021 at 2:15 PM  wrote:

> Hello. There is a random 1D array m_0 with size 3000, for example:
>
> m_0 = np.array([0, 1, 2])
>
> I need to generate two 1D arrays:
>
> m_1 = np.array([0, 1, 2, 0, 1, 2, 0, 1, 2])
> m_2 = np.array([0, 0, 0, 1, 1, 1, 2, 2, 2])
>
> Is there faster way to do it than this one:
>
> import numpy as npimport time
> N = 3
> m_0 = np.arange(N)
>
> t = time.time()
> m_1 = np.tile(m_0, N)
> m_2 = np.repeat(m_0, N)
> t = time.time() - t
>
> I tried other ways but they are slower or have the same time. Other NumPy 
> operations in my code 10-100 times faster. Why the repeating an array is so 
> slow? I need 10 times speed up. Thank you for your attantion to my problem.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ENH: Proposal to add atleast_nd function

2021-02-10 Thread Joseph Fox-Rabinovitz
I've created PR#18386 to add a function called atleast_nd to numpy and
numpy.ma. This would generalize the existing atleast_1d, atleast_2d, and
atleast_3d functions.

I proposed a similar idea about four and a half years ago:
https://mail.python.org/pipermail/numpy-discussion/2016-July/075722.html,
PR#7804. The reception was ambivalent, but a couple of folks have asked me
about this, so I'm bringing it back.

Some pros:

- This closes issue #12336
- There are a couple of Stack Overflow questions that would benefit
- Been asked about this a couple of times
- Implementation of three existing atleast_*d functions gets easier
- Looks nicer that the equivalent broadcasting and reshaping

Some cons:

- Cluttering up the API
- Maintenance burden (but not a big one)
- This is just a utility function, which can be achieved through
broadcasting and reshaping

If this meets with approval, there are a couple of interface issues that
probably need to be hashed out:

- The consensus was that this function should accept a single array, rather
than a tuple, or multiple arrays as the other atleast_nd functions do. Does
that need to be revisited?
- Right now, a `pos` argument specifies where to place new axes, if any.
That can be specified in different ways. Another way might be to specify
the offset of the existing dimensions, or something entirely different.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ENH: Proposal to add atleast_nd function

2021-02-11 Thread Joseph Fox-Rabinovitz
The original functions appear to have been written for things like *stack
originally, which actually goes a long way to explaining the inconsistent
argument list.

- Joe


On Thu, Feb 11, 2021, 12:41 Benjamin Root  wrote:

> for me, I find that the at_least{1,2,3}d functions are useful for
> sanitizing inputs. Having an at_leastnd() function can be viewed as a step
> towards cleaning up the API, not cluttering it (although, deprecations of
> the existing functions probably should be long given how long they have
> existed).
>
> On Thu, Feb 11, 2021 at 1:56 AM Stephan Hoyer  wrote:
>
>> On Wed, Feb 10, 2021 at 9:48 PM Juan Nunez-Iglesias 
>> wrote:
>>
>>> I totally agree with the namespace clutter concern, but honestly, I
>>> would use `atleast_nd` with its `pos` argument (I might rename it to
>>> `position`, `axis`, or `axis_position`) any day over `at_least{1,2,3}d`,
>>> for which I had no idea where the new axes would end up.
>>>
>>> So, I’m in favour of including it, and optionally deprecating
>>> `atleast_{1,2,3}d`.
>>>
>>>
>> I appreciate that `atleast_nd` feels more sensible than
>> `at_least{1,2,3}d`, but I don't think "better" than a pattern we would not
>> recommend is a good enough reason for inclusion in NumPy. It needs to stand
>> on its own.
>>
>> What would be the recommended use-cases for this new function?
>> Have any libraries building on top of NumPy implemented a version of this?
>>
>>
>>> Juan.
>>>
>>> On 11 Feb 2021, at 9:48 am, Sebastian Berg 
>>> wrote:
>>>
>>> On Wed, 2021-02-10 at 17:31 -0500, Joseph Fox-Rabinovitz wrote:
>>>
>>> I've created PR#18386 to add a function called atleast_nd to numpy and
>>> numpy.ma. This would generalize the existing atleast_1d, atleast_2d, and
>>> atleast_3d functions.
>>>
>>> I proposed a similar idea about four and a half years ago:
>>> https://mail.python.org/pipermail/numpy-discussion/2016-July/075722.html
>>> ,
>>> PR#7804. The reception was ambivalent, but a couple of folks have asked
>>> me
>>> about this, so I'm bringing it back.
>>>
>>> Some pros:
>>>
>>> - This closes issue #12336
>>> - There are a couple of Stack Overflow questions that would benefit
>>> - Been asked about this a couple of times
>>> - Implementation of three existing atleast_*d functions gets easier
>>> - Looks nicer that the equivalent broadcasting and reshaping
>>>
>>> Some cons:
>>>
>>> - Cluttering up the API
>>> - Maintenance burden (but not a big one)
>>> - This is just a utility function, which can be achieved through
>>> broadcasting and reshaping
>>>
>>>
>>> My main concern would be the namespace cluttering. I can't say I use
>>> even the `atleast_2d` etc. functions personally, so I would tend to be
>>> slightly against the addition. But if others land on the "useful" side here
>>> (and it seemed a bit at least on github), I am also not opposed.  It is a
>>> clean name that lines up with existing ones, so it doesn't seem like a big
>>> "mental load" with respect to namespace cluttering.
>>>
>>> Bike shedding the API is probably a good idea in any case.
>>>
>>> I have pasted the current PR documentation (as html) below for quick
>>> reference. I wonder a bit about the reasoning for having `pos` specify a
>>> value rather than just a side?
>>>
>>>
>>>
>>> numpy.atleast_nd(*ary*, *ndim*, *pos=0*)
>>> View input as array with at least ndim dimensions.
>>> New unit dimensions are inserted at the index given by *pos* if
>>> necessary.
>>> Parameters*ary  *array_like
>>> The input array. Non-array inputs are converted to arrays. Arrays that
>>> already have ndim or more dimensions are preserved.
>>> *ndim  *int
>>> The minimum number of dimensions required.
>>> *pos  *int, optional
>>> The index to insert the new dimensions. May range from -ary.ndim - 1 to
>>> +ary.ndim (inclusive). Non-negative indices indicate locations before
>>> the corresponding axis: pos=0 means to insert at the very beginning.
>>> Negative indices indicate locations after the corresponding axis: pos=-1
>>>  means to insert at the very end. 0 and -1 are always guaranteed to
>>> work. Any other number will depend on the dimensions of the existing array.
>>> Default is 0.
>>>

Re: [Numpy-discussion] ENH: Proposal to add atleast_nd function

2021-02-12 Thread Joseph Fox-Rabinovitz
On Fri, Feb 12, 2021, 09:32 Robert Kern  wrote:

> On Fri, Feb 12, 2021 at 5:15 AM Eric Wieser 
> wrote:
>
>> > There might be some linear algebraic reason why those axis positions
>> make sense, but I’m not aware of it...
>>
>> My guess is that the historical motivation was to allow grayscale `(H,
>> W)` images to be converted into `(H, W, 1)` images so that they can be
>> broadcast against `(H, W, 3)` RGB images.
>>
>
> Correct. If you do introduce atleast_nd(), I'm not sure why you'd
> deprecate and remove the one existing function that *isn't* made redundant
> thereby.
>

`atleast_nd` handles the promotion of 2D to 3D correctly. The `pos`
argument lets you tell it where to put the new axes. What's unintuitive to
my is that the 1D case gets promoted to from shape `(x,)` to shape `(1, x,
1)`. It takes two calls to `atleast_nd` to replicate that behavior.

One modification to `atleast_nd` I've thought about is making `pos` refer
to the position of the existing axes in the new array rather than the
position of the new axes, but that's likely not a useful way to go about it.

- Joe


> --
> Robert Kern
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ENH: Proposal to add atleast_nd function

2021-02-16 Thread Joseph Fox-Rabinovitz
I'm getting a generally lukewarm not negative response. Should we put it to
a vote?

- Joe

On Fri, Feb 12, 2021, 16:06 Robert Kern  wrote:

> On Fri, Feb 12, 2021 at 3:42 PM Ralf Gommers 
> wrote:
>
>>
>> On Fri, Feb 12, 2021 at 9:21 PM Robert Kern 
>> wrote:
>>
>>> On Fri, Feb 12, 2021 at 1:47 PM Ralf Gommers 
>>> wrote:
>>>

 On Fri, Feb 12, 2021 at 7:25 PM Sebastian Berg <
 sebast...@sipsolutions.net> wrote:

>
> Right, my initial feeling it that without such context `atleast_3d` is
> pretty surprising.  So I wonder if we can design `atleast_nd` in a way
> that it is explicit about this context.
>

 Agreed. I think such a use case is probably too specific to design a
 single function for, at least in such a hardcoded way.

>>>
>>> That might be an argument for not designing a new one (or at least not
>>> giving it such a name). Not sure it's a good argument for removing a
>>> long-standing one.
>>>
>>
>> I agree. I'm not sure deprecating is best. But introducing new
>> functionality where `nd(pos=3) != 3d` is also not great.
>>
>> At the very least, atleast_3d should be better documented. It also is
>> telling that Juan (a long-time) scikit-image dev doesn't like atleast_3d
>> and there's very little usage of it in scikit-image.
>>
>
> I'm fairly neutral on atleast_nd(). I think that for n=1 and n=2, you can
> derive The One Way to Do It from broadcasting semantics, but for n>=3, I'm
> not sure there's much value in trying to systematize it to a single
> convention. I think that once you get up to those dimensions, you start to
> want to have domain-specific semantics. I do agree that, in retrospect,
> atleast_3d() probably should have been named more specifically. It was of a
> piece of other conveniences like dstack() that did special things to
> support channel-last images (and implicitly treat 3D arrays as such). For
> example, DL frameworks that assemble channeled images into minibatches
> (with different conventions like BHWC and BCHW), you'd want the n=4
> behavior to do different things. I _think_ you'd just want to do those with
> different functions than a complicated set of arguments to one function.
>
> --
> Robert Kern
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Expanding the scope of numpy.unpackbits and numpy.packbits to include more than uint8 type

2021-03-29 Thread Joseph Fox-Rabinovitz
You can view any array as uint8


On Mon, Mar 29, 2021, 14:27 Rashiq Azhan  wrote:

> I would like this feature to be added since I think it can very useful
> when there is a need to process data that cannot be included in uint8.
> One of my personal requirements is modifying a 10-bit, per channel,
> images held in a NumPy array but I cannot do that using the specified
> functions. They are eloquent solution and works well with the with
> NumPy functions as long as the data is uint8.
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Re: What happened to the numpy.random documentation?

2021-10-14 Thread Joseph Fox-Rabinovitz
I second that reinstating such a list would be extremely useful. My issue
has been with the polynomial package, but the end result is the same.

- Joe

On Thu, Oct 14, 2021, 12:45 Melissa Mendonça  wrote:

> Hi Paul,
>
> Do you think having a page with the flat list of routines back, in
> addition to the explanations, would solve this?
>
> - Melissa
>
> On Thu, Oct 14, 2021 at 1:34 PM Paul M.  wrote:
>
>> Hi All,
>>
>> The documentation of Numpy's submodules  used to have a fairly standard
>> structure as shown here in the 1.16 documentation:
>>
>>   https://docs.scipy.org/doc/numpy-1.16.1/reference/routines.random.html
>>
>> Now the same page in the API documentation looks like this:
>>
>>   https://numpy.org/doc/stable/reference/random/index.html
>>
>> While I appreciate the expository text in the new documentation about how
>> the generators work, this new version is much less useful as a reference to
>> the API.  It seems like it might fit better in the user manual rather than
>> the API reference.
>>
>> From my perspective it seems like the new version of the documentation is
>> harder to navigate in terms of finding information quickly (more scrolling,
>> harder to get a bird's eye view of functions in various submodules, etc).
>>
>> Has anyone else had a similar reaction to the changes? I teach a couple
>> of courses in scientific computing and bioinformatics and my students seem
>> to also struggle to get a sense of what the different modules offer based
>> on the new version of the documentation. For now, I'm referring them to the
>> old (1.70) reference manuals as a better way to get acquainted with the
>> libraries.
>>
>> Cheers,
>> Paul Magwene
>> ___
>> NumPy-Discussion mailing list -- numpy-discussion@python.org
>> To unsubscribe send an email to numpy-discussion-le...@python.org
>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>> Member address: meliss...@gmail.com
>>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: jfoxrabinov...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Proposal: Automatic estimation of number of histogram bins for weighted data

2021-12-23 Thread Joseph Fox-Rabinovitz
For what it's worth, I've looked into this a long time ago. The missing
ingredient has always been weighted quantiles. If I'm not mistaken, the
interface already exists, but raises an error. I've had it on my back
burner to provide an O(n) C implementation of weighted introselect, but
never quite got around to it. I think there has been work to add a O(n log
n) implementation recently.

- Joe

On Thu, Dec 23, 2021 at 1:19 PM Jonathan Crall  wrote:

> While it does feel like this might be more scipy-ish than numpy-ish, numpy
> has an existing histogram method, with existing heuristics for choosing a
> number of bins automatically, with existing support for weights. What it is
> lacking is support for weights and a heuristic jointly. This proposal is
> not a massive new feature for numpy. It is just plugging a hole that exists
> in the cross product of possible argument combinations for np.histogram.
>
> Thank you for the pointer about interpretation of weights. That was
> something I felt was going to be a nuance of this, but I didn't have the
> words to describe it.
>
> Within pure numpy, I think it should be possible to compute multiple
> histograms and then aggregate them. That seems to lend itself towards
> frequency weights, but it seems to me that probability weights would use
> the same procedure to estimate bandwidth.
>
>
> https://stats.stackexchange.com/questions/354689/are-frequency-weights-and-sampling-weights-in-practice-the-same-thing
>
> And ultimately, this is just an estimator used as a convenience for
> programmers. Most real applications will need to define their bins wrt
> their problem, but I think if it makes sense for numpy to provide a
> heuristic baseline for un-weighted data, then it is natural to assume it
> would do so for weighted data as well.
>
>
>
>
> On Mon, Dec 13, 2021 at 4:03 AM Kevin Sheppard 
> wrote:
>
>> To me, this feels like it might be a better fit for SciPy or possibly
>> statsmodels (but maybe not since neither have histogram functions
>> anymore).The challenge with weighted estimators is how the weights should
>> be interpreted. Stata covers the most important cases of weights
>> https://www.reed.edu/psychology/stata/gs/tutorials/weights.html.  Would
>> these be frequency weights?  Stata supports only frequency weights
>> https://www.stata.com/manuals/u11.pdf#u11.1.6weight.
>>
>> Kevin
>>
>>
>> On Sun, Dec 12, 2021 at 9:45 AM Jonathan Crall 
>> wrote:
>>
>>> Hi all, this is my first post on this mailing list.
>>>
>>> I'm writing to propose a method for extending the histogram bandwidth
>>> estimators to work with weighted data. I originally submitted this proposal
>>> to seaborn: https://github.com/mwaskom/seaborn/issues/2710 and mwaskom
>>> suggested I take it here.
>>>
>>> Currently the unweighted auto heuristic is a combination of
>>> the Freedman-Diaconis and Sturges estimator. For reference, these rules are
>>> as follows:
>>>
>>> Sturges: return the peak-to-peak ptp=(i.e. x.max() - x.min()) and number
>>> of data points total=x.size. Then divide ptp by the log of one plus the
>>> number of data points.
>>>
>>> ptp / log2(total + 2)
>>>
>>> Freedman-Diaconis: Find the interquartile-range of the data
>>> iqr=(np.subtract(*np.percentile(x, [75, 25]))) and the number of data
>>> points total=x.size, then apply the formula:
>>>
>>> 2.0 * iqr * total ** (-1.0 / 3.0).
>>>
>>> Taking a look at these it seems (please correct me if I'm missing
>>> something that makes this not work) that there is a simple extension to
>>> weighted data. If we can find a weighted replacement for p2p, total, and
>>> iqr, the formulas should work exactly the same in the weighted case.
>>>
>>> The p2p case seems easy. Even if the data points are weighed, that
>>> doesn't change the min and max. Nothing changes here.
>>>
>>> For total, instead of taking the size of the array (which implicitly
>>> assumes each data point has a weight of 1), just sum the weight to get
>>> total=weights.sum().
>>>
>>> I believe the IQR is also computable in the weighted case.
>>>
>>> import numpy as np
>>> n = 10
>>> rng = np.random.RandomState(12554)
>>>
>>>
>>> x = rng.rand(n)
>>> w = rng.rand(n)
>>>
>>>
>>> sorted_idxs = x.argsort()
>>> x_sort = x[sorted_idxs]
>>> w_sort = w[sorted_idxs]
>>>
>>>
>>> cumtotal = w_sort.cumsum()
>>> quantiles = cumtotal / cumtotal[-1]
>>> idx2, idx1 = np.searchsorted(quantiles, [0.75, 0.25])
>>> iqr_weighted = x_sort[idx2] - x_sort[idx1]
>>> print('iqr_weighted = {!r}'.format(iqr_weighted))
>>>
>>>
>>> # test this is the roughtly the same for the "unweighted case"
>>> # (wont be exactly the same because this method does not have
>>> interpolation)
>>> w = np.ones_like(x)
>>>
>>>
>>> w_sort = w[sorted_idxs]
>>> cumtotal = w_sort.cumsum()
>>> quantiles = cumtotal / cumtotal[-1]
>>> idx2, idx1 = np.searchsorted(quantiles, [0.75, 0.25])
>>> iqr_weighted = x_sort[idx2] - x_sort[idx1]
>>> iqr_unweighted_repo = x_sort[idx2] - x_sort[idx1]
>>> print('iqr_unweighted_repo = {!r}'.for

[Numpy-discussion] Re: Proposal: Automatic estimation of number of histogram bins for weighted data

2021-12-24 Thread Joseph Fox-Rabinovitz
This seems like the catch-all if you're unsure. I'm general, the purely
technical discussion stays with the PR.

On Fri, Dec 24, 2021, 20:06 Jonathan Crall  wrote:

> Yes, #9211 <https://github.com/numpy/numpy/pull/9211> is the open PR for
> weighted quantiles. Is this something I should make an issue for on the
> numpy github? Or is the correct place to discuss it on this mailing list?
> I'd like to link to this conversation in two other places on github, but
> that's difficult when discussion is on the mailing list. But if it's more
> appropriate to talk here, let me know.
>
> On Thu, Dec 23, 2021 at 2:29 PM Joseph Fox-Rabinovitz <
> jfoxrabinov...@gmail.com> wrote:
>
>> For what it's worth, I've looked into this a long time ago. The missing
>> ingredient has always been weighted quantiles. If I'm not mistaken, the
>> interface already exists, but raises an error. I've had it on my back
>> burner to provide an O(n) C implementation of weighted introselect, but
>> never quite got around to it. I think there has been work to add a O(n log
>> n) implementation recently.
>>
>> - Joe
>>
>> On Thu, Dec 23, 2021 at 1:19 PM Jonathan Crall 
>> wrote:
>>
>>> While it does feel like this might be more scipy-ish than numpy-ish,
>>> numpy has an existing histogram method, with existing heuristics for
>>> choosing a number of bins automatically, with existing support for weights.
>>> What it is lacking is support for weights and a heuristic jointly. This
>>> proposal is not a massive new feature for numpy. It is just plugging a hole
>>> that exists in the cross product of possible argument combinations for
>>> np.histogram.
>>>
>>> Thank you for the pointer about interpretation of weights. That was
>>> something I felt was going to be a nuance of this, but I didn't have the
>>> words to describe it.
>>>
>>> Within pure numpy, I think it should be possible to compute multiple
>>> histograms and then aggregate them. That seems to lend itself towards
>>> frequency weights, but it seems to me that probability weights would use
>>> the same procedure to estimate bandwidth.
>>>
>>>
>>> https://stats.stackexchange.com/questions/354689/are-frequency-weights-and-sampling-weights-in-practice-the-same-thing
>>>
>>> And ultimately, this is just an estimator used as a convenience for
>>> programmers. Most real applications will need to define their bins wrt
>>> their problem, but I think if it makes sense for numpy to provide a
>>> heuristic baseline for un-weighted data, then it is natural to assume it
>>> would do so for weighted data as well.
>>>
>>>
>>>
>>>
>>> On Mon, Dec 13, 2021 at 4:03 AM Kevin Sheppard <
>>> kevin.k.shepp...@gmail.com> wrote:
>>>
>>>> To me, this feels like it might be a better fit for SciPy or possibly
>>>> statsmodels (but maybe not since neither have histogram functions
>>>> anymore).The challenge with weighted estimators is how the weights should
>>>> be interpreted. Stata covers the most important cases of weights
>>>> https://www.reed.edu/psychology/stata/gs/tutorials/weights.html.
>>>> Would these be frequency weights?  Stata supports only frequency weights
>>>> https://www.stata.com/manuals/u11.pdf#u11.1.6weight.
>>>>
>>>> Kevin
>>>>
>>>>
>>>> On Sun, Dec 12, 2021 at 9:45 AM Jonathan Crall 
>>>> wrote:
>>>>
>>>>> Hi all, this is my first post on this mailing list.
>>>>>
>>>>> I'm writing to propose a method for extending the histogram bandwidth
>>>>> estimators to work with weighted data. I originally submitted this 
>>>>> proposal
>>>>> to seaborn: https://github.com/mwaskom/seaborn/issues/2710 and
>>>>> mwaskom suggested I take it here.
>>>>>
>>>>> Currently the unweighted auto heuristic is a combination of
>>>>> the Freedman-Diaconis and Sturges estimator. For reference, these rules 
>>>>> are
>>>>> as follows:
>>>>>
>>>>> Sturges: return the peak-to-peak ptp=(i.e. x.max() - x.min()) and
>>>>> number of data points total=x.size. Then divide ptp by the log of one plus
>>>>> the number of data points.
>>>>>
>>>>> ptp / log2(total + 2)
>>>>>
>>>>> Freedman-Diaconis: Find the interqu

[Numpy-discussion] Proposal for new function to determine if a float contains an integer

2021-12-30 Thread Joseph Fox-Rabinovitz
Hi,

I wrote a reference implementation for a C ufunc, `isint`, which returns
True for integers and False for non-integers, found here:
https://github.com/madphysicist/isint_ufunc. The idea came from a Stack
Overflow question of mine, which has gotten a fair number of views and even
some upvotes: https://stackoverflow.com/q/35042128/2988730. The current
"recommended" solution is to use ``((x % 1) == 0)``. This is slower and
more cumbersome because of the math operations and the temporary storage.
My version returns a single array of booleans with no intermediaries, and
is between 5 and 40 times faster, depending on the type and size of the
input.

If you are interested in taking a look, there is a suite of tests and a
small benchmarking script that compares the ufunc against the modulo
expression. The entire thing currently works with bit twiddling on an
appropriately converted integer representation of the number. It assumes a
standard IEEE754 representation for float16, float32, float64. The extended
80-bit float128 format gets some special treatment because of the explicit
integer bit. Complex numbers are currently integers only if they are real
and integral. Integer types (including bool) are always integers. Time and
text raise TypeErrors, since their integerness is meaningless.

If a consensus forms that this is something appropriate for numpy, I will
need some pointers on how to package up C code properly. This was an
opportunity for me to learn to write a basic ufunc. I am still a bit
confused about where code like this would go, and how to harness numpy's
code generation. I put comments in my .c and .h file showing how I would
expect the generators to look, but I'm not sure where to plug something
like that into numpy. It would also be nice to test on architectures that
have something other than a 80-bit extended long double instead of a proper
float128 quad-precision number.

Please let me know your thoughts.

Regards,

- Joe
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Proposal for new function to determine if a float contains an integer

2021-12-30 Thread Joseph Fox-Rabinovitz
Is adding arbitrary optional parameters a thing with ufuncs? I could easily
add upper and lower bounds checks.

On Thu, Dec 30, 2021, 20:56 Brock Mendel  wrote:

> At least some of the commenters on that StackOverflow page need a slightly
> stronger check: not only is_integer(x), but also "np.iinfo(dtype).min <= x
> <= np.info(dtype).max" for some particular dtype.  i.e. "Can I losslessly
> set these values into the array I already have?"
>
>
>
> On Thu, Dec 30, 2021 at 4:34 PM Joseph Fox-Rabinovitz <
> jfoxrabinov...@gmail.com> wrote:
>
>> Hi,
>>
>> I wrote a reference implementation for a C ufunc, `isint`, which returns
>> True for integers and False for non-integers, found here:
>> https://github.com/madphysicist/isint_ufunc. The idea came from a Stack
>> Overflow question of mine, which has gotten a fair number of views and even
>> some upvotes: https://stackoverflow.com/q/35042128/2988730. The current
>> "recommended" solution is to use ``((x % 1) == 0)``. This is slower and
>> more cumbersome because of the math operations and the temporary storage.
>> My version returns a single array of booleans with no intermediaries, and
>> is between 5 and 40 times faster, depending on the type and size of the
>> input.
>>
>> If you are interested in taking a look, there is a suite of tests and a
>> small benchmarking script that compares the ufunc against the modulo
>> expression. The entire thing currently works with bit twiddling on an
>> appropriately converted integer representation of the number. It assumes a
>> standard IEEE754 representation for float16, float32, float64. The extended
>> 80-bit float128 format gets some special treatment because of the explicit
>> integer bit. Complex numbers are currently integers only if they are real
>> and integral. Integer types (including bool) are always integers. Time and
>> text raise TypeErrors, since their integerness is meaningless.
>>
>> If a consensus forms that this is something appropriate for numpy, I will
>> need some pointers on how to package up C code properly. This was an
>> opportunity for me to learn to write a basic ufunc. I am still a bit
>> confused about where code like this would go, and how to harness numpy's
>> code generation. I put comments in my .c and .h file showing how I would
>> expect the generators to look, but I'm not sure where to plug something
>> like that into numpy. It would also be nice to test on architectures that
>> have something other than a 80-bit extended long double instead of a proper
>> float128 quad-precision number.
>>
>> Please let me know your thoughts.
>>
>> Regards,
>>
>> - Joe
>> ___
>> NumPy-Discussion mailing list -- numpy-discussion@python.org
>> To unsubscribe send an email to numpy-discussion-le...@python.org
>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>> Member address: jbrockmen...@gmail.com
>>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: jfoxrabinov...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Proposal for new function to determine if a float contains an integer

2021-12-31 Thread Joseph Fox-Rabinovitz
On Fri, Dec 31, 2021 at 5:46 AM Andras Deak  wrote:

> On Fri, Dec 31, 2021 at 1:36 AM Joseph Fox-Rabinovitz <
> jfoxrabinov...@gmail.com> wrote:
>
>> Hi,
>>
>> I wrote a reference implementation for a C ufunc, `isint`, which returns
>> True for integers and False for non-integers, found here:
>> https://github.com/madphysicist/isint_ufunc. 
>>
>
> Shouldn't we keep the name of the stdlib float method?
>
> >>> (3.0).is_integer()
> True
>
> See https://docs.python.org/3/library/stdtypes.html#float.is_integer
>
>
This sounds obvious in hindsight. I renamed it to is_integer, including the
repo itself. The new link is here:
https://github.com/madphysicist/is_integer_ufunc



> András
>
>
>
>> The idea came from a Stack Overflow question of mine, which has gotten a
>> fair number of views and even some upvotes:
>> https://stackoverflow.com/q/35042128/2988730. The current "recommended"
>> solution is to use ``((x % 1) == 0)``. This is slower and more cumbersome
>> because of the math operations and the temporary storage. My version
>> returns a single array of booleans with no intermediaries, and is between 5
>> and 40 times faster, depending on the type and size of the input.
>>
>> If you are interested in taking a look, there is a suite of tests and a
>> small benchmarking script that compares the ufunc against the modulo
>> expression. The entire thing currently works with bit twiddling on an
>> appropriately converted integer representation of the number. It assumes a
>> standard IEEE754 representation for float16, float32, float64. The extended
>> 80-bit float128 format gets some special treatment because of the explicit
>> integer bit. Complex numbers are currently integers only if they are real
>> and integral. Integer types (including bool) are always integers. Time and
>> text raise TypeErrors, since their integerness is meaningless.
>>
>> If a consensus forms that this is something appropriate for numpy, I will
>> need some pointers on how to package up C code properly. This was an
>> opportunity for me to learn to write a basic ufunc. I am still a bit
>> confused about where code like this would go, and how to harness numpy's
>> code generation. I put comments in my .c and .h file showing how I would
>> expect the generators to look, but I'm not sure where to plug something
>> like that into numpy. It would also be nice to test on architectures that
>> have something other than a 80-bit extended long double instead of a proper
>> float128 quad-precision number.
>>
>> Please let me know your thoughts.
>>
>> Regards,
>>
>> - Joe
>> ___
>> NumPy-Discussion mailing list -- numpy-discussion@python.org
>> To unsubscribe send an email to numpy-discussion-le...@python.org
>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>> Member address: deak.and...@gmail.com
>>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: jfoxrabinov...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Proposal for new function to determine if a float contains an integer

2022-01-01 Thread Joseph Fox-Rabinovitz
Stefano,

That is an excellent point. Just to make sure I understand, would an
interface like `is_integer(a, int_dtype=None)` be satisfactory? That way,
there are no bounds by default (call it python integer bounds), but the
user can specify a limited type at will. An alternative would be something
like `is_integer(a, bits=None, unsigned=False)`. This would have the
advantage of testing against hypothetical types, which might be useful
sometimes, or just annoying. I could always allow a two-element tuple in as
an argument to the first version.

While I completely agree with the idea behind adding this test, one big
question remains: can I add arbirary arguments to a ufunc?

- Joe

On Sat, Jan 1, 2022 at 5:41 AM Stefano Miccoli 
wrote:

> I would rather suggest .is_integer(integer_dtype) signature because
> knowing that 1e300 is an integer is not very useful in the numpy world,
> since this integer number is not representable as a numpy.integer dtype.
>
> Note that in python
>
> assert not f.is_integer() or int(f) == f
>
> never fails because integers have unlimited precision but this does would
> not map into
>
> assert ( ~f_arr.is_integer() | (np.int64(f_arr) == f.arr) ).all()
>
> because of possible OverflowErrors.
>
> Stefano
>
> On 31 Dec 2021, at 04:46, numpy-discussion-requ...@python.org wrote:
>
> Is adding arbitrary optional parameters a thing with ufuncs? I could
> easily add upper and lower bounds checks.
>
> On Thu, Dec 30, 2021, 20:56 Brock Mendel  wrote:
>
>> At least some of the commenters on that StackOverflow page need a
>> slightly stronger check: not only is_integer(x), but also
>> "np.iinfo(dtype).min <= x <= np.info(dtype).max" for some particular
>> dtype.  i.e. "Can I losslessly set these values into the array I already
>> have?"
>>
>>
>>
>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: jfoxrabinov...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Proposal to add method for string slicing to np.char

2022-01-01 Thread Joseph Fox-Rabinovitz
I made a PR for another new method `np.char.slice_` here:
https://github.com/numpy/numpy/pull/20694

Here is an excerpt of the PR message:

There are numerous examples of string slicing being a fairly requested
feature:

- https://stackoverflow.com/q/70547027/2988730
- https://stackoverflow.com/q/39042214/2988730
- https://stackoverflow.com/q/40976714/2988730
- https://stackoverflow.com/q/64981711/2988730
- https://stackoverflow.com/q/31387047/2988730
- https://stackoverflow.com/q/69856133/2988730
- ... I stopped searching around here

Given the existence of the `char` module, there is no reason not to include
a basic slicing operation that can work cheaper than making views and
copies of a string, or switching to pandas for this one feature. This PR
introduces such a function. It's written entirely in python, and does its
absolute best not to make a copy of any data.

The original inspiration for this is my answer to the first question in the
list above. I've added a couple of features since then, like the ability to
have a meaningful non-unit step and the ability to set the length of
non-unit-step chunks.

Please let me know your thoughts about the value of something like this.

Regards,

- Joe
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Proposal for new function to determine if a float contains an integer

2022-01-02 Thread Joseph Fox-Rabinovitz
Is there a guide on how to pacakage non-ufunc functions with multiple
loops? Something like sort? It looks like there is no way of adding
additional arguments to a ufunc as of yet.

On a related note, would it be more useful to have a function that returns
the number of bits required to store a number, or -1 if it has a fractional
part? Then you could just test something like ``(k := integer_bits(a)) < 64
& k > 0``.

- Joe


On Sat, Jan 1, 2022 at 5:55 AM Joseph Fox-Rabinovitz <
jfoxrabinov...@gmail.com> wrote:

> Stefano,
>
> That is an excellent point. Just to make sure I understand, would an
> interface like `is_integer(a, int_dtype=None)` be satisfactory? That way,
> there are no bounds by default (call it python integer bounds), but the
> user can specify a limited type at will. An alternative would be something
> like `is_integer(a, bits=None, unsigned=False)`. This would have the
> advantage of testing against hypothetical types, which might be useful
> sometimes, or just annoying. I could always allow a two-element tuple in as
> an argument to the first version.
>
> While I completely agree with the idea behind adding this test, one big
> question remains: can I add arbirary arguments to a ufunc?
>
> - Joe
>
> On Sat, Jan 1, 2022 at 5:41 AM Stefano Miccoli 
> wrote:
>
>> I would rather suggest .is_integer(integer_dtype) signature because
>> knowing that 1e300 is an integer is not very useful in the numpy world,
>> since this integer number is not representable as a numpy.integer dtype.
>>
>> Note that in python
>>
>> assert not f.is_integer() or int(f) == f
>>
>> never fails because integers have unlimited precision but this does would
>> not map into
>>
>> assert ( ~f_arr.is_integer() | (np.int64(f_arr) == f.arr) ).all()
>>
>> because of possible OverflowErrors.
>>
>> Stefano
>>
>> On 31 Dec 2021, at 04:46, numpy-discussion-requ...@python.org wrote:
>>
>> Is adding arbitrary optional parameters a thing with ufuncs? I could
>> easily add upper and lower bounds checks.
>>
>> On Thu, Dec 30, 2021, 20:56 Brock Mendel  wrote:
>>
>>> At least some of the commenters on that StackOverflow page need a
>>> slightly stronger check: not only is_integer(x), but also
>>> "np.iinfo(dtype).min <= x <= np.info(dtype).max" for some particular
>>> dtype.  i.e. "Can I losslessly set these values into the array I already
>>> have?"
>>>
>>>
>>>
>>
>> ___
>> NumPy-Discussion mailing list -- numpy-discussion@python.org
>> To unsubscribe send an email to numpy-discussion-le...@python.org
>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>> Member address: jfoxrabinov...@gmail.com
>>
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Feature query: fetch top/bottom k from array

2022-02-22 Thread Joseph Fox-Rabinovitz
Joe,

Could you show an example that you find inelegant and elaborate on how you
intend to improve it? It's hard to discuss without more specific
information.

- Joe

On Tue, Feb 22, 2022, 07:23 Joseph Bolton 
wrote:

> Morning,
>
> My apologies if this deviates from the vision of numpy:
>
> I find myself often requiring the indices and/or values of the top (or
> bottom) k items in a numpy array.
>
> I am aware of solutions involving partition/argpartition but these are
> inelegant.
>
> I am thinking of 1-dimensional arrays, but this concept extends to an
> arbitrary number of dimensions.
>
> Is this a feature that would benefit the numpy package? I am happy to code
> it up.
>
> Thanks for your time!
>
> Best regards
> Joe
>
>
>
>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: jfoxrabinov...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: ndarray shape permutation

2022-05-17 Thread Joseph Fox-Rabinovitz
You could easily write an extension to ndarray that maps axis names to
indices and vice versa.

Joe

On Tue, May 17, 2022, 21:32 Paul Korir  wrote:

> Thanks for your replies.
>
> In retrospect, I realise that using the shape will not be helpful for a
> cubic array i.e. the permutations of (10, 10, 10) are all (10, 10, 10)!
> However, the problem remains. Let me try to explain.
>
> Short version
> The problem boils down to the meaning of axis indices as arguments to
> swapaxes and transpose (and any related functions). Swapping axes or
> transposing an array gives new meanings to the indices. For example,
> suppose I have a volume of shape C, R, S. Then 0 will refer to C, 1 will
> refer to R and 2 will refer to S. After I transpose it, say using (1, 2, 0)
> so that the shape becomes R, S, C then now 0 will refer to R, 1 will refer
> to S and 2 will refer to C. I can no longer reverse the transposition or
> transpose it predictably to achieve a certain shape, which is an important
> operation in some applications where the meaning of the axes is significant.
>
> Long version
> Suppose I have a volume of shape (C, R, S) and I have a corresponding
> assignment of physical axes so that C=X, R=Y and S=Z. This is equivalent to
> placing the volume with C along the X axis, R along Y axis and S along the
> Z axis. Now, suppose I would like to permute the axes by only making
> reference to the axis names: what is the shape corresponding to the
> orientation (Z, Y, X)?
>
> This is a simple example because we only swap two axes and the resulting
> shape is the same as performing the same swap in the shape: (S, R, C). If
> we knew the indices of the axis names then we can infer these and pass them
> to swapaxes or transpose:
>
> vol = numpy.random.rand(C, R, S) # of shape (C, R, S) -> (X, Y, Z)
> # now (Z, Y, X)
> new_vol = numpy.swapaxes(vol, 0, 2)
> new_vol.shape # (S, R, C)
>
> The same applies to a double swap e.g. (Y, Z, X), though it is less
> straightforward using swapaxes. swapaxes only takes two indices (obviously)
> so we would need to call it twice reflecting the two swaps required. So we
> have to somehow figure which axes to swap successively: ((0, 2) then (0,
> 1)). We can do this in one step with numpy.transpose simply using indices
> (1, 2, 0).
>
> However, (and this is the big 'however'), how would we reverse this? The
> array has no memory of the original axes and 0, 1, and 2 now refer to the
> current axes. This is where the axes names (e.g. X, Y and Z) would come in
> handy.
>
> Axis names will allow permutations to happen predictably since the array
> will 'remember' what the original references were.
>
> Here is what I propose: an API to numpy.ndarray with some identity e.g.
>
> vol = numpy.random.rand(C, R, S)
> vol.shape # C, R, S
> vol.axes = ('X', 'Y', 'Z') # C=X, R=Y, S=Z
> new_vol = vol.permute_axes(('Z', 'Y', 'X'))
> # either
> new_vol.axes # ('X', 'Y', 'Z') # preserve orientation but change shape
> new_vol.shape # S, R, C
> # or
> new_vol.axes # ('Z', 'Y', 'X') # preserve shape but change orientation
> new_vol.shape # C, R, S
> # we can now reverse the permutation
> old_vol = new_vol.permute_axes(('X', 'Y', 'Z'))
> numpy.array_equal(vol, old_vol) # True
>
> I've checked the numpy API documentation and there is no attribute .axes
> present so this is the best candidate. Additionally, this will require the
> .permute_axes() method/function.
>
> Thanks for your consideration.
>
> Paul
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: jfoxrabinov...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com