from:"eat"

Re: [Numpy-discussion] Numpy Memory Error with corrcoef

2012-03-27 Thread eat

Hi,

On Tue, Mar 27, 2012 at 12:12 PM, Nicole Stoffels <
nicole.stoff...@forwind.de> wrote:

> **
> Dear all,
>
> I get the following memory error while running my program:
>
> *Traceback (most recent call last):
>   File "/home/nistl/Software/Netzbetreiber/FLOW/src/MemoryError_Debug.py",
> line 9, in 
> correlation = corrcoef(data_records)
>   File "/usr/lib/python2.7/dist-packages/numpy/lib/function_base.py", line
> 1992, in corrcoef
> c = cov(x, y, rowvar, bias)
>   File "/usr/lib/python2.7/dist-packages/numpy/lib/function_base.py", line
> 1973, in cov
> return (dot(X, X.T.conj()) / fact).squeeze()
> MemoryError*
>
> Here an easy example how to reproduce the error:
>
> *#!/usr/bin/env python2.7
>
> from pybinsel import open
> from numpy import *
>
> if __name__ == '__main__':
>
> data_records = random.random((459375, 24))
> correlation = corrcoef(data_records)
>
> *My real data has the same dimension. Is this a size problem of the array
> or did I simply make a mistake in the application of corrcoef?
>
> I hope that you can help me! Thanks!
>
As other ones has explained this approach yields an enormous matrix.
However, if I have understood your problem correctly you could implement a
helper class to iterate over all of your observations. Something like along
the lines (although it will take hours? with your data size) to iterate
over all correlations:

"""A helper class for correlations between observations."""

import numpy as np


 class Correlations(object):

def __init__(self, data):

self.m= data.shape[0]

# compatible with corrcoef

self.scale= self.m- 1

self.data= data- np.mean(data, 1)[:, None]

# but you may actually need to scale and translate data

# more application speficic manner

self.var= (self.data** 2.).sum(1)/ self.scale


 def obs_kth(self, k):

c= np.dot(self.data, self.data[k])/ self.scale

return c/ (self.var[k]* self.var)** .5


 def obs_iterate(self):

for k in xrange(self.m):

yield self.obs_kth(k)


 if __name__ == '__main__':

data= np.random.randn(5, 3)

print np.corrcoef(data).round(3)

print

c= Correlations(data)

print np.array([p for p in c.obs_iterate()]).round(3)


My 2 cents,
-eat

>
> Best regards,
>
> Nicole Stoffels
>
> --
>
> Dipl.-Met. Nicole Stoffels
>
> Wind Power Forecasting and Simulation
>
> ForWind - Center for Wind Energy Research
> Institute of Physics
> Carl von Ossietzky University Oldenburg
>
> Ammerländer Heerstr. 136
> D-26129 Oldenburg
>
> Tel: +49(0)441 798 - 5079
> Fax: +49(0)441 798 - 5099
>
> Web  : www.ForWind.de
> Email: nicole.stoff...@forwind.de
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] problem with vectorized difference equation

2012-04-07 Thread eat

Hi,

On Sat, Apr 7, 2012 at 1:27 AM, Sameer Grover wrote:

> On Saturday 07 April 2012 02:51 AM, Francesco Barale wrote:
> > Hello Sameer,
> >
> > Thank you very much for your reply. My goal was to try to speed up the
> loop
> > describing the accumulator. In the (excellent) book I was mentioning in
> my
> > initial post I could find one example that seemed to match what I was
> trying
> > to do. Basically, it is said that a loop of the following kind:
> >
> > n = size(u)-1
> > for i in xrange(1,n,1):
> >  u_new[i] = u[i-1] + u[i] + u[i+1]
> >
> > should be equivalent to:
> >
> > u[1:n] = u[0:n-1] + u[1:n] + u[i+1]
> This example is correct.
>
> What I was trying to point out was that the single expression y[1:n] =
> y[0:n-1] + u[1:n] will iterate over the array but will not accumulate.
> It will add y[0:n-1] to u[1:n] and assign the result to y[1:n].
>
> For example,
> y = [1,2,3,4]
> u = [5,6,7,8]
> Then y[0:n-1] = [1,2,3] and u[1:n]=[6,7,8]
>
> The statement y[1:n] = y[0:n-1] + u[1:n] implies
> y[1:n] = [6+1,7+2,8+3] = [7,9,11]
> yielding y = [1,7,9,11]
>
FWIIFO, if assignment in loop like this ever makes any sense (which I
doubt),

>
> whereas the code:
>
> for i in range(0,n-1):
>  y[i+1] = y[i] + u[i+1]
>
then it can be captured in a function, like
In []: def s0(y, u):
   ..: yn= y.copy()
   ..: for k in xrange(y.size- 1):
   ..: yn[k+ 1]= yn[k]+ u[k+ 1]
   ..: return yn
   ..:
and now this function can be easily transformed to utilize cumsum, like
In []: def s1(y, u):
   ..: un= u.copy()
   ..: un[0]= y[0]
   ..: return cumsum(un)
   ..:
thus
In []: y, u= rand(1e5), rand(1e5)
In []: allclose(s0(y, u), s1(y, u))
Out[]: True
and definitely this transformation will outperform a plain python loop
In []: timeit s0(y, u)
10 loops, best of 3: 122 ms per loop
In []: timeit s1(y, u)
100 loops, best of 3: 2.16 ms per loop
In []: 122/ 2.16
Out[]: 56.48148148148148


My 2 cents,
-eat

>
> will accumulate and give y = [1,7,14,22]
>
> Sameer
> > Am I missing something?
> >
> > Regards,
> > Francesco
> >
> >
> > Sameer Grover wrote:
> >> On Saturday 07 April 2012 12:14 AM, francesco82 wrote:
> >>> Hello everyone,
> >>>
> >>> After reading the very good post
> >>>
> http://technicaldiscovery.blogspot.com/2011/06/speeding-up-python-numpy-cython-and.html
> >>> and the book by H. P. Langtangen 'Python scripting for computational
> >>> science' I was trying to speed up the execution of a loop on numpy
> arrays
> >>> being used to describe a simple difference equation.
> >>>
> >>> The actual code I am working on involves some more complicated
> equations,
> >>> but I am having the same exact behavior as described below. To test the
> >>> improvement in speed I wrote the following in vect_try.py:
> >>>
> >>> #!/usr/bin/python
> >>> import numpy as np
> >>> import matplotlib.pyplot as plt
> >>>
> >>> dt = 0.02 #time step
> >>> time = np.arange(0,2,dt) #time array
> >>> u = np.sin(2*np.pi*time) #input signal array
> >>>
> >>> def vect_int(u,y): #vectorized function
> >>>   n = u.size
> >>>   y[1:n] = y[0:n-1] + u[1:n]
> >>>   return y
> >>>
> >>> def sc_int(u,y): #scalar function
> >>>   y = y + u
> >>>   return y
> >>>
> >>> def calc_vect(u, func=vect_int):
> >>>   out = np.zeros(u.size)
> >>>   for i in xrange(u.size):
> >>>   out = func(u,out)
> >>>   return out
> >>>
> >>> def calc_sc(u, func=sc_int):
> >>>   out = np.zeros(u.size)
> >>>   for i in xrange(u.size-1):
> >>>   out[i+1] = sc_int(u[i+1],out[i])
> >>>   return out
> >>>
> >>> To verify the execution time I've used the timeit function in Ipython:
> >>>
> >>> import vect_try as vt
> >>> timeit vt.calc_vect(vt.u) -->   1000 loops, best of 3: 494 us per loop
> >>> timeit vt.calc_sc(vt.u) -->1 loops, best of 3: 92.8 us per loop
> >>>
> >>> As you can see the scalar implementation looping one item at the time
> >>> (calc_sc) is 494/92.8~5.3 times faster than the vectorized one
> >>> (calc_vect).
> >>>
> >>> My problem consists in the fact that I need to iterate the execution of
> >>> ca

Re: [Numpy-discussion] (no subject)

2012-04-22 Thread eat

Hi,

On Fri, Apr 20, 2012 at 9:15 PM, Andre Martel wrote:

> What would be the best way to remove the maximum from a cube and
> "collapse" the remaining elements along the z-axis ?
> For example, I want to reduce Cube to NewCube:
>
> >>> Cube
> array([[[  13,   2,   3,  42],
> [  5, 100,   7,   8],
> [  9,   1,  11,  12]],
>
>[[ 25,   4,  15,   1],
> [ 17,  30,   9,  20],
> [ 21,   2,  23,  24]],
>
>[[ 1,   2,  27,  28],
> [ 29,  18,  31,  32],
> [ -1,   3,  35,   4]]])
>
> NewCube
>
> array([[[  13,   2,   3,  1],
> [  5, 30,   7,   8],
> [  9,   1,  11,  12]],
>
>[[ 1,   2,  15,  28],
> [ 17,  18,  9,  20],
> [ -1,   2,  23,   4]]])
>
> I tried with argmax() and then roll() and delete() but these
> all work on 1-D arrays only. Thanks.
>
Perhaps it would be more straightforward to process via 2D-arrays, like:
In []: C
Out[]:
array([[[ 13,   2,   3,  42],
[  5, 100,   7,   8],
[  9,   1,  11,  12]],
   [[ 25,   4,  15,   1],
[ 17,  30,   9,  20],
[ 21,   2,  23,  24]],
   [[  1,   2,  27,  28],
[ 29,  18,  31,  32],
[ -1,   3,  35,   4]]])
In []: C_in= C.reshape(3, -1).copy()
In []: ndx= C_in.argmax(0)
In []: C_out= C_in[:2, :]
In []: C_out[:, ndx== 0]= C_in[1:, ndx== 0]
In []: C_out[1, ndx== 1]= C_in[2, ndx== 1]
In []: C_out.reshape(2, 3, 4)
Out[]:
array([[[13,  2,  3,  1],
[ 5, 30,  7,  8],
[ 9,  1, 11, 12]],
   [[ 1,  2, 15, 28],
[17, 18,  9, 20],
[-1,  2, 23,  4]]])

My 2 cents,
-eat

>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] why not zerodivision error?

2012-05-20 Thread eat

Hi,

On Sun, May 20, 2012 at 10:21 AM, Chao YUE  wrote:

> Dear all,
>
> could anybody give one sentence about this? why in the loop I didn't get
> zerodivision error by when I explicitly do this, I get a zerodivision
> error? thanks.
>
> In [7]: for i in np.arange(-10,10):
> print 1./i
>...:
> -0.1
> -0.
> -0.125
> -0.142857142857
> -0.1667
> -0.2
> -0.25
> -0.
> -0.5
> -1.0
> inf
> 1.0
> 0.5
> 0.
> 0.25
> 0.2
> 0.1667
> 0.142857142857
> 0.125
> 0.
>
> In [8]: 1/0.
> ---
> ZeroDivisionError Traceback (most recent call last)
> /mnt/f/data/DROUGTH/ in ()
> > 1 1/0.
>
> ZeroDivisionError: float division by zero
>
> In [9]: 1./0.
> ---
> ZeroDivisionError Traceback (most recent call last)
> /mnt/f/data/DROUGTH/ in ()
> > 1 1./0.
>
> ZeroDivisionError: float division by zero
>
> In [10]: 1./0
> ---
> ZeroDivisionError Traceback (most recent call last)
> /mnt/f/data/DROUGTH/ in ()
> > 1 1./0
>
> ZeroDivisionError: float division by zero

You may like to read more on here
http://docs.scipy.org/doc/numpy/reference/generated/numpy.seterr.html#numpy.seterr

So, for your specific example:
In []: a= arange(-10, 10)
In []: 1./ a
Out[]:
array([-0.1   , -0., -0.125 , -0.14285714, -0.1667,
   -0.2   , -0.25  , -0., -0.5   , -1.,
   inf,  1.,  0.5   ,  0.,  0.25  ,
0.2   ,  0.1667,  0.14285714,  0.125 ,  0.])
In []: seterr(divide= 'raise')
In []: 1./ a
----
Traceback (most recent call last):
  File "", line 1, in 
FloatingPointError: divide by zero encountered in divide


My 2 cents,
-eat

>
>
>
> Chao
>
> --
>
> ***
> Chao YUE
> Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
> UMR 1572 CEA-CNRS-UVSQ
> Batiment 712 - Pe 119
> 91191 GIF Sur YVETTE Cedex
> Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16
>
> 
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] fast access and normalizing of ndarray slices

2012-06-04 Thread eat

Hi,

On Mon, Jun 4, 2012 at 12:44 AM, srean  wrote:

> Hi Wolfgang,
>
>  I think you are looking for reduceat( ), in particular add.reduceat()
>

Indeed OP could utilize add.reduceat(...), like:

# tst.py

import numpy as np


 def reduce(data, lengths):

ind, ends= np.r_[lengths, lengths], lengths.cumsum()

ind[::2], ind[1::2]= ends- lengths, ends

return np.add.reduceat(np.r_[data, 0], ind)[::2]


 def normalize(data, lengths):

return data/ np.repeat(reduce(data, lengths), lengths)


 def gen(par):

lengths= np.random.randint(*par)

return np.random.randn(lengths.sum()), lengths


 if __name__ == '__main__':

data= np.array([1, 2, 1, 2, 3, 4, 1, 2, 3], dtype= float)

lengths= np.array([2, 4, 3])

print reduce(data, lengths)

print normalize(data, lengths).round(2)


Resulting:
In []: %run tst
[  3.  10.   6.]
[ 0.33  0.67  0.1   0.2   0.3   0.4   0.17  0.33  0.5 ]

Fast enough:
In []: data, lengths= gen([5, 15, 5e4])
In []: data.size
Out[]: 476028
In []: %timeit normalize(data, lengths)
10 loops, best of 3: 29.4 ms per loop


My 2 cents,
-eat

>
> -- srean
>
> On Thu, May 31, 2012 at 12:36 AM, Wolfgang Kerzendorf
>  wrote:
> > Dear all,
> >
> > I have an ndarray which consists of many arrays stacked behind each
> other (only conceptually, in truth it's a normal 1d float64 array).
> > I have a second array which tells me the start of the individual data
> sets in the 1d float64 array and another one which tells me the length.
> > Example:
> >
> > data_array = (conceptually) [[1,2], [1,2,3,4], [1,2,3]] = in reality
> [1,2,1,2,3,4,1,2,3, dtype=float64]
> > start_pointer = [0, 2, 6]
> > length_data = [2, 4, 3]
> >
> > I now want to normalize each of the individual data sets. I wrote a
> simple for loop over the start_pointer and length data grabbed the data and
> normalized it and wrote it back to the big array. That's slow. Is there an
> elegant numpy way to do that? Do I have to go the cython way?
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] convert any non square matrix in to square matrix using numpy

2012-06-18 Thread eat

Hi,

On Mon, Jun 18, 2012 at 6:55 PM, bob tnur  wrote:

> Hi,
> how I can convert (by adding zero) of any non-square numpy matrix in to
> square matrix using numpy? then how to find the minimum number in each row
> except the zeros added(for making square matrix)? ;)
>
Perhaps something like this:
In []: def make_square(A):
   ..: s= A.shape
   ..: if s[0]< s[1]:
   : return r_[A, zeros((s[1]- s[0], s[1]), dtype= A.dtype)]
   ..: return c_[A, zeros((s[0], s[0]- s[1]), dtype= A.dtype)]
   ..:
In []: A= rand(4, 2)
In []: make_square(A)
Out[]:
array([[ 0.76109774,  0.42980812,  0.,  0.],
   [ 0.11810978,  0.59622975,  0.,  0.],
   [ 0.54991376,  0.29315485,  0.,  0.],
   [ 0.78182313,  0.3828001 ,  0.,  0.]])
In []: make_square(A.T)
Out[]:
array([[ 0.76109774,  0.11810978,  0.54991376,  0.78182313],
   [ 0.42980812,  0.59622975,  0.29315485,  0.3828001 ],
   [ 0.,  0.,  0.,  0.],
   [ 0.,  0.,  0.,  0.]])

will help you.

My 2 cents,
-eat

>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Good way to develop numpy as popular choice!

2012-06-21 Thread eat

Heh,

On Thu, Jun 21, 2012 at 6:03 PM, Robert Kern  wrote:

> On Thu, Jun 21, 2012 at 3:59 PM, bob tnur  wrote:
> > Hi all numpy fun;)
> > This question is already posted in stackoverflow by some people, I am
> just
> > thinking that numpy python will do this with trick;) I guess numpy will
> be
> > every ones choice as its popularity increases. The question is herein:
> >
> http://stackoverflow.com/questions/10074270/how-can-i-find-the-minimum-number-of-lines-needed-to-cover-all-the-zeros-in-a-2
>
> My "numpy solution" for this is just
>
>  $ pip install munkres
>
munkres seems to be a pure python implementation ;-).

FWIIW, There exists pure python implementation(s) to outperform
munkresimplementation more than 200 times already with a 100x100
random cost
matrix, based on shortest path variant of the Hungarian algorithm (more
details of the algorithms can be found for example at
http://www.assignmentproblems.com/).

How the assignment algorithms are (typically) described, it actually may be
quite a tedious job to create more performance ones utilizing numpy arrays
instead of lists of lists.

My 2 cents,
-eat

>
> http://pypi.python.org/pypi/munkres
>
> --
> Robert Kern
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Good way to develop numpy as popular choice!

2012-06-22 Thread eat

Hi,

On Fri, Jun 22, 2012 at 6:05 PM, Benjamin Root  wrote:

>
>
> On Fri, Jun 22, 2012 at 10:25 AM, Travis Oliphant wrote:
>
>> Accessing individual elements of NumPy arrays is slower than accessing
>> individual elements of lists --- around 2.5x-3x slower.NumPy has to do
>> more work to figure out what kind of indexing you are trying to do because
>> of its flexibility.   It also has to create the Python object to return.
>>  In contrast, the list approach already has the Python objects created and
>> you are just returning pointers to them and there is much less flexibility
>> in the kinds of indexing you can do.
>>
>> Simple timings show that a.item(i,j) is about 2x slower than list element
>> access (but faster than a[i,j] which is about 2.5x to 3x slower).   The
>> slowness of a.item is due to the need to create the Python object to return
>> (there are just raw bytes there) so it gives some idea of the relative cost
>> of each part of the slowness of a[i,j].
>>
>> Also, math on the array scalars returned from NumPy will be slower than
>> math on integers and floats --- because NumPy re-uses the ufunc machinery
>> which is not optimized at all for scalars.
>>
>> The take-away is that NumPy is built for doing vectorized operations on
>> bytes of data.   It is not optimized for doing element-by-element
>> individual access.The right approach there is to just use lists (or use
>> a version specialized for the kind of data in the lists that removes the
>> boxing and unboxing).
>>
>> Here are my timings using IPython for NumPy indexing:
>>
>> 1-D:
>>
>> In[2]: a = arange(100)
>>
>> In [3]: %timeit [a.item(i) for i in xrange(100)]
>> 1 loops, best of 3: 25.6 us per loop
>>
>> In [4]: %timeit [a[i] for i in xrange(100)]
>> 1 loops, best of 3: 31.8 us per loop
>>
>> In [5]: al = a.tolist()
>>
>> In [6]: %timeit [al[i] for i in xrange(100)]
>> 10 loops, best of 3: 10.6 us per loop
>>
>>
>>
>> 2-D:
>>
>> In [7]: a = arange(100).reshape(10,10)
>>
>> In [8]: al = a.tolist()
>>
>> In [9]: %timeit [al[i][j] for i in xrange(10) for j in xrange(10)]
>> 1 loops, best of 3: 18.6 us per loop
>>
>> In [10]: %timeit [a[i,j] for i in xrange(10) for j in xrange(10)]
>> 1 loops, best of 3: 44.4 us per loop
>>
>> In [11]: %timeit [a.item(i,j) for i in xrange(10) for j in xrange(10)]
>> 1 loops, best of 3: 34.2 us per loop
>>
>>
>>
>> -Travis
>>
>>
> However, what is the timing/memory cost of converting a large numpy array
> that already exists into python list of lists?  If all my processing before
> the munkres step is using NumPy, converting it into python lists has a
> cost.  Also, your timings indicate only ~2x slowdown, while the timing
> tests done by eat show an order-of-magnitude difference.  I suspect there
> is great room for improvement before even starting to worry about the array
> access issues.
>
To create list of list from array is quite fast, like
In []: A= rand(500, 500)
In []: %timeit A.tolist()
100 loops, best of 3: 10.8 ms per loop

Regards,
-eat

>
> Cheers!
> Ben Root
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Problems understanding histogram2d

2012-07-20 Thread eat

Hi,

On Fri, Jul 20, 2012 at 6:42 PM, yogesh karpate wrote:

> I think since its a joint histogram, you need to have equal no. of data
> points and bins
> in both x and y.

Makes sense that number of elements of data points (x, y) is equal. Perhaps
the documentation like
http://docs.scipy.org/doc/numpy-1.6.0/reference/generated/numpy.histogram2d.html
could
make this aspect more clearer.

Especially confused is the requirement that *x* : array_like, shape(N,) and
*y* : array_like, shape(M,), may indicate that N!= M could be a feasible
case. A slightly better way would just state that x and y must be
one dimensional and they must be equal length.

My 2 cents,
-eat

>
>
> On Fri, Jul 20, 2012 at 5:11 PM, Andreas Hilboll  wrote:
>
>> Hi,
>>
>> I have a problem using histogram2d:
>>
>>from numpy import linspace, histogram2d
>>bins_x = linspace(-180., 180., 360)
>>bins_y = linspace(-90., 90., 180)
>>data_x = linspace(-179.96875, 179.96875, 5760)
>>data_y = linspace(-89.96875, 89.96875, 2880)
>>histogram2d(data_x, data_y, (bins_x, bins_y))
>>
>>AttributeError: The dimension of bins must be equal to the dimension of
>> the sample x.
>>
>> I would expect histogram2d to return a 2d array of shape (360,180), which
>> is full of 256s. What am I missing here?
>>
>> Cheers,
>> Andreas.
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
>
> --
> Regards
> Yogesh Karpate
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] np.unique for one bi-dimensional array

2012-07-24 Thread eat

Hi,

On Wed, Jul 25, 2012 at 1:09 AM, Giuseppe Amatulli <
giuseppe.amatu...@gmail.com> wrote:

> Hi,
>
> would like to identify unique pairs of numbers in two arrays o in one
> bi-dimensional array, and count the observation
>
> a_clean=array([4,4,5,4,4,4])
> b_clean=array([3,5,4,4,3,4])
>
> and obtain
> (4,3,2)
> (4,5,1)
> (5,4,1)
> (4,4,2)
>
> I solved with tow loops but off course there will be a fast solution
> Any idea?
> what about using np.unique for one bi-dimensional array?
>
According the docs
http://docs.scipy.org/doc/numpy/reference/generated/numpy.unique.html np.unique
will flatten the input to 1D array.

However, perhaps something like the following lines will help you:
In []: lot= zip(a_clean, b_clean)
In []: lot
Out[]: [(4, 3), (4, 5), (5, 4), (4, 4), (4, 3), (4, 4)]
In []: [[x, lot.count(x)] for x in set(lot)]
Out[]: [[(4, 5), 1], [(5, 4), 1], [(4, 4), 2], [(4, 3), 2]]


My 2 cents,
-eat

>
> In bash I usually unique command
>
> thanks in advance
> Giuseppe
>
> --
> Giuseppe Amatulli
> Web: www.spatial-ecology.net
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] array slicing questions

2012-07-30 Thread eat

Hi,

A partial answer to your questions:

On Mon, Jul 30, 2012 at 10:33 PM, Vlastimil Brom
wrote:

> Hi all,
> I'd like to ask for some hints or advice regarding the usage of
> numpy.array and especially  slicing.
>
> I only recently tried numpy and was impressed by the speedup in some
> parts of the code, hence I suspect, that I might miss some other
> oportunities in this area.
>
> I currently use the following code for a simple visualisation of the
> search matches within the text, the arrays are generally much larger
> than the sample - the texts size is generally hundreds of kilobytes up
> to a few MB - with an index position for each character.
> First there is a list of spans(obtained form the regex match objects),
> the respective character indices in between these slices should be set
> to 1:
>
> >>> import numpy
> >>> characters_matches = numpy.zeros(10)
> >>> matches_spans = numpy.array([[2,4], [5,9]])
> >>> for start, stop in matches_spans:
> ... characters_matches[start:stop] = 1
> ...
> >>> characters_matches
> array([ 0.,  0.,  1.,  1.,  0.,  1.,  1.,  1.,  1.,  0.])
>
> Is there maybe a way tu achieve this in a numpy-only way - without the
> python loop?
> (I got the impression, the powerful slicing capabilities could make it
> possible, bud haven't found this kind of solution.)
>
>
> In the next piece of code all the character positions are evaluated
> with their "neighbourhood" and a kind of running proportions of the
> matched text parts are computed (the checks_distance could be
> generally up to the order of the half the text length, usually less :
>
> >>>
> >>> check_distance = 1
> >>> floating_checks_proportions = []
> >>> for i in numpy.arange(len(characters_matches)):
> ... lo = i - check_distance
> ... if lo < 0:
> ... lo = None
> ... hi = i + check_distance + 1
> ... checked_sublist = characters_matches[lo:hi]
> ... proportion = (checked_sublist.sum() / (check_distance * 2 + 1.0))
> ... floating_checks_proportions.append(proportion)
> ...
> >>> floating_checks_proportions
> [0.0, 0.1, 0.3, 0.3,
> 0.3, 0.3, 1.0, 1.0,
> 0.3, 0.1]
> >>>
>
Define a function for proportions:

from numpy import r_

from numpy.lib.stride_tricks import as_strided as ast

def proportions(matches, distance= 1):

cd, cd2p1, s= distance, 2* distance+ 1, matches.strides[0]

# pad

m= r_[[0.]* cd, matches, [0.]* cd]

# create a suitable view

m= ast(m, shape= (m.shape[0], cd2p1), strides= (s, s))

# average
return m[:-2* cd].sum(1)/ cd2p1
and use it like:
In []: matches
Out[]: array([ 0.,  0.,  1.,  1.,  0.,  1.,  1.,  1.,  1.,  0.])

In []: proportions(matches).round(2)
Out[]: array([ 0.  ,  0.33,  0.67,  0.67,  0.67,  0.67,  1.  ,  1.  ,
 0.67,  0.33])
In []: proportions(matches, 5).round(2)
Out[]: array([ 0.27,  0.36,  0.45,  0.55,  0.55,  0.55,  0.55,  0.55,
 0.45,  0.36])

>
> I'd like to ask about the possible better approaches, as it doesn't
> look very elegant to me, and I obviously don't know the implications
> or possible drawbacks of numpy arrays in some scenarios.
>
> the pattern
> for i in range(len(...)): is usually considered inadequate in python,
> but what should be used in this case as the indices are primarily
> needed?
> is something to be gained or lost using (x)range or np.arange as the
> python loop is (probably?) inevitable anyway?
>
Here np.arange(.) will create a new array and potentially wasting memory if
it's not otherwise used. IMO nothing wrong looping with xrange(.) (if you
really need to loop ;).

> Is there some mor elegant way to check for the "underflowing" lower
> bound "lo" to replace with None?
>
> Is it significant, which container is used to collect the results of
> the computation in the python loop - i.e. python list or a numpy
> array?
> (Could possibly matplotlib cooperate better with either container?)
>
> And of course, are there maybe other things, which should be made
> better/differently?
>
> (using Numpy 1.6.2, python 2.7.3, win XP)
>

My 2 cents,
-eat

> Thanks in advance for any hints or suggestions,
>regards,
>   Vlastimil Brom
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] array slicing questions

2012-07-31 Thread eat

Hi,

On Tue, Jul 31, 2012 at 10:23 AM, Vlastimil Brom
wrote:

> 2012/7/30 eat :
> > Hi,
> >
> > A partial answer to your questions:
> >
> > On Mon, Jul 30, 2012 at 10:33 PM, Vlastimil Brom <
> vlastimil.b...@gmail.com>
> > wrote:
> >>
> >> Hi all,
> >> I'd like to ask for some hints or advice regarding the usage of
> >> numpy.array and especially  slicing.
> >>
> >> I only recently tried numpy and was impressed by the speedup in some
> >> parts of the code, hence I suspect, that I might miss some other
> >> oportunities in this area.
> >>
> >> I currently use the following code for a simple visualisation of the
> >> search matches within the text, the arrays are generally much larger
> >> than the sample - the texts size is generally hundreds of kilobytes up
> >> to a few MB - with an index position for each character.
> >> First there is a list of spans(obtained form the regex match objects),
> >> the respective character indices in between these slices should be set
> >> to 1:
> >>
> >> >>> import numpy
> >> >>> characters_matches = numpy.zeros(10)
> >> >>> matches_spans = numpy.array([[2,4], [5,9]])
> >> >>> for start, stop in matches_spans:
> >> ... characters_matches[start:stop] = 1
> >> ...
> >> >>> characters_matches
> >> array([ 0.,  0.,  1.,  1.,  0.,  1.,  1.,  1.,  1.,  0.])
> >>
> >> Is there maybe a way tu achieve this in a numpy-only way - without the
> >> python loop?
> >> (I got the impression, the powerful slicing capabilities could make it
> >> possible, bud haven't found this kind of solution.)
> >>
> >>
> >> In the next piece of code all the character positions are evaluated
> >> with their "neighbourhood" and a kind of running proportions of the
> >> matched text parts are computed (the checks_distance could be
> >> generally up to the order of the half the text length, usually less :
> >>
> >> >>>
> >> >>> check_distance = 1
> >> >>> floating_checks_proportions = []
> >> >>> for i in numpy.arange(len(characters_matches)):
> >> ... lo = i - check_distance
> >> ... if lo < 0:
> >> ... lo = None
> >> ... hi = i + check_distance + 1
> >> ... checked_sublist = characters_matches[lo:hi]
> >> ... proportion = (checked_sublist.sum() / (check_distance * 2 +
> 1.0))
> >> ... floating_checks_proportions.append(proportion)
> >> ...
> >> >>> floating_checks_proportions
> >> [0.0, 0.1, 0.3, 0.3,
> >> 0.3, 0.3, 1.0, 1.0,
> >> 0.3, 0.1]
> >> >>>
> >
> > Define a function for proportions:
> >
> > from numpy import r_
> >
> > from numpy.lib.stride_tricks import as_strided as ast
> >
> > def proportions(matches, distance= 1):
> >
> > cd, cd2p1, s= distance, 2* distance+ 1, matches.strides[0]
> >
> > # pad
> >
> > m= r_[[0.]* cd, matches, [0.]* cd]
> >
> > # create a suitable view
> >
> > m= ast(m, shape= (m.shape[0], cd2p1), strides= (s, s))
> >
> > # average
> >
> > return m[:-2* cd].sum(1)/ cd2p1
> > and use it like:
> > In []: matches
> > Out[]: array([ 0.,  0.,  1.,  1.,  0.,  1.,  1.,  1.,  1.,  0.])
> >
> > In []: proportions(matches).round(2)
> > Out[]: array([ 0.  ,  0.33,  0.67,  0.67,  0.67,  0.67,  1.  ,  1.  ,
>  0.67,
> > 0.33])
> > In []: proportions(matches, 5).round(2)
> > Out[]: array([ 0.27,  0.36,  0.45,  0.55,  0.55,  0.55,  0.55,  0.55,
>  0.45,
> > 0.36])
> >>
> >>
> >> I'd like to ask about the possible better approaches, as it doesn't
> >> look very elegant to me, and I obviously don't know the implications
> >> or possible drawbacks of numpy arrays in some scenarios.
> >>
> >> the pattern
> >> for i in range(len(...)): is usually considered inadequate in python,
> >> but what should be used in this case as the indices are primarily
> >> needed?
> >> is something to be gained or lost using (x)range or np.arange as the
> >> python loop is (probably?) inevitable anyway?
> >
> > Here np.arange(.) will create a new array and potentially

Re: [Numpy-discussion] array slicing questions

2012-07-31 Thread eat

Hi,

On Tue, Jul 31, 2012 at 5:01 PM, Vlastimil Brom wrote:

> 2012/7/31 eat :
> > Hi,
> >
> > On Tue, Jul 31, 2012 at 10:23 AM, Vlastimil Brom <
> vlastimil.b...@gmail.com>
> > wrote:
> >>
> >> 2012/7/30 eat :
> >> > Hi,
> >> >
> >> > A partial answer to your questions:
> >> >
> >> > On Mon, Jul 30, 2012 at 10:33 PM, Vlastimil Brom
> >> > 
> >> > wrote:
> >> >>
> >> >> Hi all,
> >> >> I'd like to ask for some hints or advice regarding the usage of
> >> >> numpy.array and especially  slicing.
> >> >>
> >> >> I only recently tried numpy and was impressed by the speedup in some
> >> >> parts of the code, hence I suspect, that I might miss some other
> >> >> oportunities in this area.
> >> >>
> >> >> I currently use the following code for a simple visualisation of the
> >> >> search matches within the text, the arrays are generally much larger
> >> >> than the sample - the texts size is generally hundreds of kilobytes
> up
> >> >> to a few MB - with an index position for each character.
> >> >> First there is a list of spans(obtained form the regex match
> objects),
> >> >> the respective character indices in between these slices should be
> set
> >> >> to 1:
> >> >>
> >> >> >>> import numpy
> >> >> >>> characters_matches = numpy.zeros(10)
> >> >> >>> matches_spans = numpy.array([[2,4], [5,9]])
> >> >> >>> for start, stop in matches_spans:
> >> >> ... characters_matches[start:stop] = 1
> >> >> ...
> >> >> >>> characters_matches
> >> >> array([ 0.,  0.,  1.,  1.,  0.,  1.,  1.,  1.,  1.,  0.])
> >> >>
> >> >> Is there maybe a way tu achieve this in a numpy-only way - without
> the
> >> >> python loop?
> >> >> (I got the impression, the powerful slicing capabilities could make
> it
> >> >> possible, bud haven't found this kind of solution.)
> >> >>
> >> >>
> >> >> In the next piece of code all the character positions are evaluated
> >> >> with their "neighbourhood" and a kind of running proportions of the
> >> >> matched text parts are computed (the checks_distance could be
> >> >> generally up to the order of the half the text length, usually less :
> >> >>
> >> >> >>>
> >> >> >>> check_distance = 1
> >> >> >>> floating_checks_proportions = []
> >> >> >>> for i in numpy.arange(len(characters_matches)):
> >> >> ... lo = i - check_distance
> >> >> ... if lo < 0:
> >> >> ... lo = None
> >> >> ... hi = i + check_distance + 1
> >> >> ... checked_sublist = characters_matches[lo:hi]
> >> >> ... proportion = (checked_sublist.sum() / (check_distance * 2 +
> >> >> 1.0))
> >> >> ... floating_checks_proportions.append(proportion)
> >> >> ...
> >> >> >>> floating_checks_proportions
> >> >> [0.0, 0.1, 0.3, 0.3,
> >> >> 0.3, 0.3, 1.0, 1.0,
> >> >> 0.3, 0.1]
> >> >> >>>
> >> >
> >> > Define a function for proportions:
> >> >
> >> > from numpy import r_
> >> >
> >> > from numpy.lib.stride_tricks import as_strided as ast
> >> >
> >> > def proportions(matches, distance= 1):
> >> >
> >> > cd, cd2p1, s= distance, 2* distance+ 1, matches.strides[0]
> >> >
> >> > # pad
> >> >
> >> > m= r_[[0.]* cd, matches, [0.]* cd]
> >> >
> >> > # create a suitable view
> >> >
> >> > m= ast(m, shape= (m.shape[0], cd2p1), strides= (s, s))
> >> >
> >> > # average
> >> >
> >> > return m[:-2* cd].sum(1)/ cd2p1
> >> > and use it like:
> >> > In []: matches
> >> > Out[]: array([ 0.,  0.,  1.,  1.,  0.,  1.,  1.,  1.,  1.,  0.])
> >> >
> >> > In []: proportions(matches).round(2)
>

Re: [Numpy-discussion] array slicing questions

2012-07-31 Thread eat

Hi,

On Tue, Jul 31, 2012 at 6:43 PM, Nathaniel Smith  wrote:

> On Tue, Jul 31, 2012 at 2:23 PM, eat  wrote:
> > Apparently ast(.) does not return a view of the original matches rather a
> > copy of size (n* (2* distance+ 1)), thus you may run out of memory.
>
> The problem isn't memory, it's that on 32-bit Python,
> np.prod(arr.shape) must be <2**32 (or maybe 2**31 -- something like
> that).

I think this is what the traceback is indicating.

> Normally you wouldn't be creating such arrays anyway because
> they would be too big to fit into memory, so this problem isn't
> observed, but when you're using stride_tricks then it's very easy to
> create arrays that use only a small amount of memory but that have
> very large shapes.

But in this specific case .nbytes attribute indicates that a huge amount of
memory is used. So I guess stride_tricks(.) is not returning a view.

> Solution: don't buy more memory, just use a 64-bit
> Python, where the limit is 2**64 (or 2**63 or whatever).
>
Regards,
-eat

>
> -n
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] array slicing questions

2012-07-31 Thread eat

Hi,

On Tue, Jul 31, 2012 at 7:30 PM, Nathaniel Smith  wrote:

> On Tue, Jul 31, 2012 at 4:57 PM, eat  wrote:
> > Hi,
> >
> > On Tue, Jul 31, 2012 at 6:43 PM, Nathaniel Smith  wrote:
> >>
> >> On Tue, Jul 31, 2012 at 2:23 PM, eat  wrote:
> >> > Apparently ast(.) does not return a view of the original matches
> rather
> >> > a
> >> > copy of size (n* (2* distance+ 1)), thus you may run out of memory.
> >>
> >> The problem isn't memory, it's that on 32-bit Python,
> >> np.prod(arr.shape) must be <2**32 (or maybe 2**31 -- something like
> >> that).
> >
> > I think this is what the traceback is indicating.
> >>
> >> Normally you wouldn't be creating such arrays anyway because
> >> they would be too big to fit into memory, so this problem isn't
> >> observed, but when you're using stride_tricks then it's very easy to
> >> create arrays that use only a small amount of memory but that have
> >> very large shapes.
> >
> > But in this specific case .nbytes attribute indicates that a huge amount
> of
> > memory is used. So I guess stride_tricks(.) is not returning a view.
>
> No, .nbytes is lying to you -- it just returns np.prod(arr.shape) *
> arr.dtype.itemsize. It isn't smart enough to realize that you have
> wacky strides that cause the same memory region to be referenced by
> many different array elements.
>
Aha, very good to know.

Thanks,
-eat

>
> -n
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] array slicing questions

2012-07-31 Thread eat

Hi,

On Tue, Jul 31, 2012 at 7:20 PM, Vlastimil Brom wrote:

> 2012/7/31 eat :
> > Hi,
> >
> > On Tue, Jul 31, 2012 at 5:01 PM, Vlastimil Brom <
> vlastimil.b...@gmail.com>
> > wrote:
> >>
> >> 2012/7/31 eat :
> >> > Hi,
> >> >
> >> > On Tue, Jul 31, 2012 at 10:23 AM, Vlastimil Brom
> >> > 
> >> > wrote:
> >> >>
> >> >> 2012/7/30 eat :
> >> >> > Hi,
> >> >> >
> >> >> > A partial answer to your questions:
> >> >> >
> >> >> > On Mon, Jul 30, 2012 at 10:33 PM, Vlastimil Brom
> >> >> > 
> >> >> > wrote:
> >> >> >>
> >> >> >> Hi all,
> >> >> >> I'd like to ask for some hints or advice regarding the usage of
> >> >> >> numpy.array and especially  slicing.
> >> >> >>
> >> >> >> I only recently tried numpy and was impressed by the speedup in
> some
> >> >> >> parts of the code, hence I suspect, that I might miss some other
> >> >> >> oportunities in this area.
> >> >> >>
> >> >> >> I currently use the following code for a simple visualisation of
> the
> >> >> >> search matches within the text, the arrays are generally much
> larger
> >> >> >> than the sample - the texts size is generally hundreds of
> kilobytes
> >> >> >> up
> >> >> >> to a few MB - with an index position for each character.
> >> >> >> First there is a list of spans(obtained form the regex match
> >> >> >> objects),
> >> >> >> the respective character indices in between these slices should be
> >> >> >> set
> >> >> >> to 1:
> >> >> >>
> >> >> >> >>> import numpy
> >> >> >> >>> characters_matches = numpy.zeros(10)
> >> >> >> >>> matches_spans = numpy.array([[2,4], [5,9]])
> >> >> >> >>> for start, stop in matches_spans:
> >> >> >> ... characters_matches[start:stop] = 1
> >> >> >> ...
> >> >> >> >>> characters_matches
> >> >> >> array([ 0.,  0.,  1.,  1.,  0.,  1.,  1.,  1.,  1.,  0.])
> >> >> >>
> >> >> >> Is there maybe a way tu achieve this in a numpy-only way - without
> >> >> >> the
> >> >> >> python loop?
> >> >> >> (I got the impression, the powerful slicing capabilities could
> make
> >> >> >> it
> >> >> >> possible, bud haven't found this kind of solution.)
> >> >> >>
> >> >> >>
> >> >> >> In the next piece of code all the character positions are
> evaluated
> >> >> >> with their "neighbourhood" and a kind of running proportions of
> the
> >> >> >> matched text parts are computed (the checks_distance could be
> >> >> >> generally up to the order of the half the text length, usually
> less
> >> >> >> :
> >> >> >>
> >> >> >> >>>
> >> >> >> >>> check_distance = 1
> >> >> >> >>> floating_checks_proportions = []
> >> >> >> >>> for i in numpy.arange(len(characters_matches)):
> >> >> >> ... lo = i - check_distance
> >> >> >> ... if lo < 0:
> >> >> >> ... lo = None
> >> >> >> ... hi = i + check_distance + 1
> >> >> >> ... checked_sublist = characters_matches[lo:hi]
> >> >> >> ... proportion = (checked_sublist.sum() / (check_distance * 2
> +
> >> >> >> 1.0))
> >> >> >> ... floating_checks_proportions.append(proportion)
> >> >> >> ...
> >> >> >> >>> floating_checks_proportions
> >> >> >> [0.0, 0.1, 0.3,
> 0.3,
> >> >> >> 0.3, 0.3, 1.0, 1.0,
> >> >> >> 0.3, 0.1]
> >> >> >> >>>
> >> >> >
> >> >> > Def

Re: [Numpy-discussion] Reordering 2 dimensional array by column

2012-08-02 Thread eat

Hi,

On Thu, Aug 2, 2012 at 3:43 PM, Nicole Stoffels
wrote:

> Dear all,
>
> I have a two-dimensional array:
>
> a = array([[1,2,3],[0,2,1],[5,7,8]])
>
> I want to reorder it by the last column in descending order, so that I get:
>
> b =array([[5, 7, 8],[1, 2, 3],[0, 2, 1]])
>
Perhaps along the lines:
In []: a
Out[]:
array([[1, 2, 3],
   [0, 2, 1],
   [5, 7, 8]])
In []: ndx= a[:, 2].argsort()
In []: a[ndx[::-1], :]
Out[]:
array([[5, 7, 8],
   [1, 2, 3],
   [0, 2, 1]])


>
> What I did first is the following, which reorders the array in ascending
> order (I found that method in the internet):
>
> b = array(sorted(a, key=lambda new_entry: new_entry[2]))
> b = array([[0, 2, 1],[1, 2, 3],[5, 7, 8]])
>
> But I want it just the other way arround. So I did the following
> afterwards which results in an array only containing zeros:
> b_indices = b.argsort()
> b_matrix = b[b_indices[::-1]]
> new_b = b_matrix[len(b_matrix)-1]
>
> Is there an easy way to reorder it? Or is there at least a complicated
> way which produces the right output?
>
> I hope you can help me! Thanks!
>
My 2 cents,
-eat

>
> Best regards,
>
> Nicole
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] 2 greatest values, in a 3-d array, along one axis

2012-08-03 Thread eat

Hi,

On Fri, Aug 3, 2012 at 7:02 PM, Angus McMorland  wrote:

> On 3 August 2012 11:18, Jim Vickroy  wrote:
>
>> Hello everyone,
>>
>> I'm trying to determine the 2 greatest values, in a 3-d array, along one
>> axis.
>>
>> Here is an approach:
>>
>> # --
>> # procedure to determine greatest 2 values for 3rd dimension of 3-d
>> array ...
>> import numpy, numpy.ma
>> xcnt, ycnt, zcnt   = 2,3,4 # actual case is (1024, 1024, 8)
>> p0 = numpy.empty ((xcnt,ycnt,zcnt))
>> for z in range (zcnt) : p0[:,:,z] = z*z
>> zaxis  = 2# max
>> values to be determined for 3rd axis
>> p0max  = numpy.max (p0, axis=zaxis)   # max
>> values for zaxis
>> maxindices = numpy.argmax (p0, axis=zaxis)#
>> indices of max values
>> p1 = p0.copy()# work
>> array to scan for 2nd highest values
>> j, i   = numpy.meshgrid (numpy.arange (ycnt), numpy.arange
>> (xcnt))
>> p1[i,j,maxindices] = numpy.NaN# flag
>> all max values
>> p1 = numpy.ma.masked_where (numpy.isnan (p1), p1) # hide
>> all max values
>> p1max  = numpy.max (p1, axis=zaxis)   # 2nd
>> highest values for zaxis
>> # additional code to analyze p0max and p1max goes here
>> # --
>>
>> I would appreciate feedback on a simpler approach -- e.g., one that does
>> not require masked arrays and or use of magic values like NaN.
>>
>> Thanks,
>> -- jv
>>
>
> Here's a way that only uses argsort and fancy indexing:
>
> >>>a = np.random.randint(10, size=(3,3,3))
> >>>print a
>
> [[[0 3 8]
>   [4 2 8]
>   [8 6 3]]
>
>  [[0 6 7]
>   [0 3 9]
>   [0 9 1]]
>
>  [[7 9 7]
>   [5 2 9]
>   [9 3 3]]]
>
> >>>am = a.argsort(axis=2)
> >>>maxs = a[np.arange(a.shape[0])[:,None], np.arange(a.shape[1])[None],
> am[:,:,-1]]
> >>>print maxs
>
> [[8 8 8]
>  [7 9 9]
>  [9 9 9]]
>
> >>>seconds = a[np.arange(a.shape[0])[:,None], np.arange(a.shape[1])[None],
> am[:,:,-2]]
> >>>print seconds
>
> [[3 4 6]
>  [6 3 1]
>  [7 5 3]]
>
> And to double check:
>
> >>>i, j = 0, 1
> >>>l = a[i, j,:]
> >>>print l
>
> [4 2 8]
>
>  >>>print np.max(a[i,j,:]), maxs[i,j]
>
> 8 8
>
> >>>print l[np.argsort(l)][-2], second[i,j]
>
> 4 4
>
> Good luck.
>
Here the np.indicies function may help a little bit, like:
In []: a= randint(10, size= (3, 2, 4))
In []: a
Out[]:
array([[[1, 9, 6, 6],
[0, 3, 4, 2]],
   [[4, 2, 4, 4],
[5, 9, 4, 4]],
   [[6, 1, 4, 3],
[5, 4, 5, 5]]])

In []: ndx= indices(a.shape)
In []: # largest
In []: a[a.argsort(0), ndx[1], ndx[2]][-1]
Out[]:
array([[6, 9, 6, 6],
   [5, 9, 5, 5]])

In []: # second largest
In []: a[a.argsort(0), ndx[1], ndx[2]][-2]
Out[]:
array([[4, 2, 4, 4],
   [5, 4, 4, 4]])


My 2 cents,
-eat

>
> Angus.
> --
> AJC McMorland
> Post-doctoral research fellow
> Neurobiology, University of Pittsburgh
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] sum and prod

2012-09-08 Thread eat

Hi,

On Sun, Sep 9, 2012 at 12:56 AM, nicky van foreest wrote:

> Hi,
>
> I ran the following code:
>
> args = np.array([4,8])
> print np.sum( (arg > 0) for arg in args)
> print np.sum([(arg > 0) for arg in args])
> print np.prod( (arg > 0) for arg in args)
> print np.prod([(arg > 0) for arg in args])
>
Can't see why someone would write code like above, but anyway:
In []: args = np.array([4,8])
In []: print np.sum( (arg > 0) for arg in args)
2
In []: print np.sum([(arg > 0) for arg in args])
2
In []: print np.prod( (arg > 0) for arg in args)
 at 0x062BDA08>
In []: print np.prod([(arg > 0) for arg in args])
1
In []: print np.prod( (arg > 0) for arg in args).next()
True
In []: sys.version
Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]'
In []: np.version.version
Out[]: '1.6.0'

My 2 cents,
-eat

>
> with this result:
>
> 2
> 1
>  at 0x1c70410>
> 1
>
> Is the difference between prod and sum intentional? I would expect
> that  numpy.prod would also work on a generator, just like numpy.sum.
>
> BTW: the last line does what I need: the product over the truth values
> of all elements of args. Is there perhaps a nicer (conciser) way to
> achieve this?  Thanks.
>
> Nicky
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] nan result from np.linalg.lstsq()

2012-10-29 Thread eat

Hi,

On Mon, Oct 29, 2012 at 11:01 AM, Larry Paltrow wrote:

> np.isnan(data) is True
> >>> False
>
Check with:
np.all(~np.isnan(x))

My 2 cents,
-eat

>
>
> On Mon, Oct 29, 2012 at 1:50 AM, Pauli Virtanen  wrote:
>
>> Larry Paltrow  gmail.com> writes:
>> > I'm having some trouble using the linalg.lstsq() function
>> > with certain data and trying to understand why. It's
>> > always returning nans when I feed it this numpy array of float64 data:
>> >
>> > data = df.close.values #coming from a pandas dataframe
>> >
>> > type(data)
>> > >>> numpy.ndarray
>>
>> Maybe you have NaNs in the array? (i.e. np.isnan(data) is True)
>>
>> --
>> Pauli Virtanen
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] New numpy functions: filled, filled_like

2013-01-13 Thread eat

Hi,

On Sun, Jan 13, 2013 at 7:27 PM, Nathaniel Smith  wrote:

> Hi all,
>
> PR 2875 adds two new functions, that generalize zeros(), ones(),
> zeros_like(), ones_like(), by simply taking an arbitrary fill value:
>   https://github.com/numpy/numpy/pull/2875
> So
>   np.ones((10, 10))
> is the same as
>   np.filled((10, 10), 1)
>
> The implementations are trivial, but the API seems useful because it
> provides an idiomatic way of efficiently creating an array full of
> inf, or nan, or None, whatever funny value you need. All the
> alternatives are either inefficient (np.ones(...) * np.inf) or
> cumbersome (a = np.empty(...); a.fill(...)). Or so it seems to me. But
> there's a question of taste here; one could argue instead that these
> just add more clutter to the numpy namespace. So, before we merge,
> anyone want to chime in?
>
> (Bonus, extra bike-sheddy survey: do people prefer
>   np.filled((10, 10), np.nan)
>   np.filled_like(my_arr, np.nan)
>
+0
OTOH, it might also be handy to let val to be an array as well, which is
then repeated to fill the array.

My 2 cents.
-eat

> or
>   np.filled(np.nan, (10, 10))
>   np.filled_like(np.nan, my_arr)
> ?)
>
> -n
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] argsort

2013-01-15 Thread eat

Hi,

On Tue, Jan 15, 2013 at 1:50 PM, Mads Ipsen  wrote:

>  Hi,
>
> I simply can't understand this. I'm trying to use argsort to produce
> indices that can be used to sort an array:
>
>   from numpy import *
>
>   indices = array([[4,3],[1,12],[23,7],[11,6],[8,9]])
>   args = argsort(indices, axis=0)
>   print indices[args]
>
> gives:
>
> [[[ 1 12]
>   [ 4  3]]
>
>  [[ 4  3]
>   [11  6]]
>
>  [[ 8  9]
>   [23  7]]
>
>  [[11  6]
>   [ 8  9]]
>
>  [[23  7]
>   [ 1 12]]]
>
> I thought this should produce a sorted version of the indices array.
>
> Any help is appreciated.
>
Perhaps these three different point of views will help you a little bit
more to move on:
In []: x
Out[]:
array([[ 4,  3],
   [ 1, 12],
   [23,  7],
   [11,  6],
   [ 8,  9]])
In []: ind= x.argsort(axis= 0)
In []: ind
Out[]:
array([[1, 0],
   [0, 3],
   [4, 2],
   [3, 4],
   [2, 1]])

In []: x[ind[:, 0]]
Out[]:
array([[ 1, 12],
   [ 4,  3],
   [ 8,  9],
   [11,  6],
   [23,  7]])

In []: x[ind[:, 1]]
Out[]:
array([[ 4,  3],
   [11,  6],
   [23,  7],
   [ 8,  9],
   [ 1, 12]])

In []: x[ind, [0, 1]]
Out[]:
array([[ 1,  3],
   [ 4,  6],
   [ 8,  7],
   [11,  9],
   [23, 12]])
-eat

>
> Best regards,
>
> Mads
>
>  --
> +-+
> | Mads Ipsen  |
> +--+--+
> | Gåsebæksvej 7, 4. tv |  |
> | DK-2500 Valby| phone:  +45-29716388 |
> | Denmark  | email:  mads.ip...@gmail.com |
> +--+--+
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Shouldn't all in-place operations simply return self?

2013-01-16 Thread eat

Hi,

In a recent thread
http://article.gmane.org/gmane.comp.python.numeric.general/52772 it was
proposed that .fill(.) should return self as an alternative for a trivial
two-liner.

I'm raising now the question: what if all in-place operations indeed could
return self? How bad this would be? A 'strong' counter argument may be
found at
http://mail.python.org/pipermail/python-dev/2003-October/038855.html.

But anyway, at least for me. it would be much more straightforward to
implement simple mini dsl's (
http://en.wikipedia.org/wiki/Domain-specific_language) a much more
straightforward manner.

What do you think?


-eat

P.S. FWIW, if this idea really gains momentum obviously I'm volunteering to
create a PR of it.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Shouldn't all in-place operations simply return self?

2013-01-19 Thread eat

Hi,

On Fri, Jan 18, 2013 at 12:13 AM, Thouis (Ray) Jones wrote:

> On Thu, Jan 17, 2013 at 10:27 AM, Charles R Harris
>  wrote:
> >
> >
> > On Wed, Jan 16, 2013 at 5:11 PM, eat  wrote:
> >>
> >> Hi,
> >>
> >> In a recent thread
> >> http://article.gmane.org/gmane.comp.python.numeric.general/52772 it was
> >> proposed that .fill(.) should return self as an alternative for a
> trivial
> >> two-liner.
> >>
> >> I'm raising now the question: what if all in-place operations indeed
> could
> >> return self? How bad this would be? A 'strong' counter argument may be
> found
> >> at http://mail.python.org/pipermail/python-dev/2003-October/038855.html
> .
> >>
> >> But anyway, at least for me. it would be much more straightforward to
> >> implement simple mini dsl's
> >> (http://en.wikipedia.org/wiki/Domain-specific_language) a much more
> >> straightforward manner.
> >>
> >> What do you think?
> >>
> >
> > I've read Guido about why he didn't like inplace operations returning
> self
> > and found him convincing for a while. And then I listened to other folks
> > express a preference for the freight train style and found them
> convincing
> > also. I think it comes down to a preference for one style over another
> and I
> > go back and forth myself. If I had to vote, I'd go for returning self,
> but
> > I'm not sure it's worth breaking python conventions to do so.
> >
> > Chuck
>
> I'm -1 on breaking with Python convention without very good reasons.
>
As an example I personally find following behavior highly counter intuitive.
In []: p, P= rand(3, 1), rand(3, 5)

In []: ((p- P)** 2).sum(0).argsort()
Out[]: array([2, 4, 1, 3, 0])

In []: ((p- P)** 2).sum(0).sort().diff()

Traceback (most recent call last):
  File "", line 1, in 
AttributeError: 'NoneType' object has no attribute 'diff'


Regards,
-eat

>
> Ray
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] drawing the line (was: Adding .abs() method to the array object)

2013-02-26 Thread eat

Huh,

On Tue, Feb 26, 2013 at 5:03 PM,  wrote:

> On Tue, Feb 26, 2013 at 9:39 AM, Benjamin Root  wrote:
> >
> >
> > On Mon, Feb 25, 2013 at 8:23 PM, Alan G Isaac 
> wrote:
> >>
> >> I'm hoping this discussion will return to the drawing the line question.
> >>
> >>
> http://stackoverflow.com/questions/8108688/in-python-when-should-i-use-a-function-instead-of-a-method
> >>
> >> Alan Isaac
> >
> >
> > Proposed line:
> >
> > Reduction methods only.
> >
> > Discuss...
>
> arr.dot ?
>
> the 99 most common functions for which chaining looks nice.
>
Care to elaborate more for us less uninitiated?

Regards,
-eat

>
> Josef
>
> >
> >
> > Ben Root
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] should get rid of the annoying numpy STDERR output

2011-03-24 Thread eat

Hi

2011/3/24 Dmitrey 

>  >>> from numpy import inf, array
> >>> inf*0
> nan
>
> (ok)
>
> >>> array(inf) * 0.0
> StdErr: Warning: invalid value encountered in multiply
> nan
>
> My cycled calculations yields this thousands times slowing computations and
> making text output completely non-readable.
>
Would old= seterr(invalid= 'ignore') be sufficient for you?

Regards,
eat

>
> >>> from numpy import __version__
> >>> __version__
> '2.0.0.dev-1fe8136'
>
> D.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] should get rid of the annoying numpy STDERR output

2011-03-24 Thread eat

Hi

On Thu, Mar 24, 2011 at 4:17 PM, Skipper Seabold wrote:

> On Thu, Mar 24, 2011 at 9:45 AM, Ralf Gommers
>  wrote:
> > 2011/3/24 Dmitrey :
> >>
> >> Hi
> >>
> >> 2011/3/24 Dmitrey 
> >>>
> >>> >>> from numpy import inf, array
> >>> >>> inf*0
> >>> nan
> >>>
> >>> (ok)
> >>>
> >>> >>> array(inf) * 0.0
> >>> StdErr: Warning: invalid value encountered in multiply
> >>> nan
> >>>
> >>> My cycled calculations yields this thousands times slowing computations
> >>> and making text output completely non-readable.
> >>
> >> Would old= seterr(invalid= 'ignore') be sufficient for you?
> >>
> >> yes for me, but I'm not sure for all those users who use my soft. Maybe
> it
> >> will hide some bugs in their objective functions and nonlinear
> constraints
> >> in numerical optimization and nonlinear equation systems.
> >
> > If we do what you request in your message subject your users will have
> > the same issue.
> >
> > You can wrap (parts of) your code in something like:
> >  olderr = seterr(invalid= 'ignore')
> >  
> >  seterr(**olderr)
> >
>
> Also, as Robert pointed out to me before np.errstate is a
> context-manager for ignoring these warnings. I often wrap optimization
> code with it.
>
I didn't realize that its context-manager, but that's really create!
(Perhaps documents could reflect that.)

Regards,
eat

>
> In [3]: np.array(np.inf)*0.
> Warning: invalid value encountered in multiply
> Out[3]: nan
>
> In [4]: with np.errstate(all='ignore'):
>   ...: np.array(np.inf)*0.
>   ...:
>   ...:
> Out[4]: nan
>
> In [5]: np.array(np.inf)*0.
> Warning: invalid value encountered in multiply
> Out[5]: nan
>
> Skipper
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] random number genration

2011-03-29 Thread eat

Hi,

On Tue, Mar 29, 2011 at 12:00 PM, Alex Ter-Sarkissov wrote:

> If I want to generate a string of random bits with equal probability I run
>
> random.randint(0,2,size).
>
> What if I want a specific proportion of bits? In other words, each bit is 1
> with probability p<1/2 and 0 with probability q=1-p?
>
Would it be sufficient to:
In []: bs= ones(1e6, dtype= int)
In []: bs[randint(0, 1e6, 1e5)]= 0
In []: bs.sum()/ 1e6
Out[]: 0.904706

Regards,
eat

>
> thanks
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] random number genration

2011-03-29 Thread eat

On Tue, Mar 29, 2011 at 1:29 PM, eat  wrote:

> Hi,
>
> On Tue, Mar 29, 2011 at 12:00 PM, Alex Ter-Sarkissov 
> wrote:
>
>> If I want to generate a string of random bits with equal probability I run
>>
>>
>> random.randint(0,2,size).
>>
>> What if I want a specific proportion of bits? In other words, each bit is
>> 1 with probability p<1/2 and 0 with probability q=1-p?
>>
> Would it be sufficient to:
> In []: bs= ones(1e6, dtype= int)
> In []: bs[randint(0, 1e6, 1e5)]= 0
> In []: bs.sum()/ 1e6
> Out[]: 0.904706
>
Or:
In []: bs= ones(1e6, dtype= int)
In []: bs[rand(1e6)< 1./ 10]= 0
In []: bs.sum()/ 1e6
Out[]: 0.89983

>
> Regards,
> eat
>
>>
>> thanks
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] np.histogram on arrays.

2011-03-29 Thread eat

Hi,

On Tue, Mar 29, 2011 at 4:29 PM, Éric Depagne  wrote:

> Hi all.
>
> Sorry if this question has already been asked. I've searched the archive,
> but
> could not find anything related, so here is my question.
>
> I'm using np.histogram on a 4000x4000 array, each with 200 bins. I do that
> on
> both dimensions, meaning I compute 8000 histograms. It takes around 5
> seconds
> (which is of course quite fast).
>
> I was wondering why np.histogram does not accept an axis parameter so that
> it
> could work directly on the array without me having to write a loop.
>
> Or maybe did I miss some parameters using np.histogram.
>
FWIW, have you considered to use
http://docs.scipy.org/doc/numpy/reference/generated/numpy.histogramdd.html#numpy.histogramdd

Regards,
eat

>
> Thanks.
>
> Éric.
>
> Un clavier azerty en vaut deux
> --
> Éric Depagnee...@depagne.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] np.histogram on arrays.

2011-03-29 Thread eat

Hi,

On Tue, Mar 29, 2011 at 5:13 PM, Éric Depagne  wrote:

> > FWIW, have you considered to use
> >
> http://docs.scipy.org/doc/numpy/reference/generated/numpy.histogramdd.html#
> > numpy.histogramdd
> >
> > Regards,
> > eat
> >
> I tried, but I get a
> /usr/lib/pymodules/python2.6/numpy/lib/function_base.pyc in
> histogramdd(sample, bins, range, normed, weights)
>370 # Reshape is used so that overlarge arrays
>
>371 # will raise an error.
>
> --> 372 hist = zeros(nbin, float).reshape(-1)
>373
>374 # Compute the sample indices in the flattened histogram matrix.
>
>
> ValueError: sequence too large; must be smaller than 32
>
> so I suspect my array is too big for histogramdd
>
So it seems that you give your array directly to histogramdd (asking a 4000D
histogram!). Surely that's not what you are trying to achieve. Can you
elaborate more on your objectives? Perhaps some code (slow but working) to
demonstrate the point.

Regards,
eat

>
> Éric.
> --
> Un clavier azerty en vaut deux
> --
> Éric Depagnee...@depagne.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] np.histogram on arrays.

2011-03-30 Thread eat

Hi,

On Wed, Mar 30, 2011 at 10:04 AM, Éric Depagne  wrote:

> Hi.
>
> Sorry for not having been clearer. I'll explain a little bit.
>
> I have 4k x 4k images that I want to analyse. I turn them into numpy arrays
> so
> I have 4k x 4k np.array.
>
> My analysis starts with determining the bias level. To do that, I compute
> for
> each line, and then for each row, an histogram.
> So I compute 8000 histograms.
>
> Here is the code I've used sofar:
>
>for i in range(self.data.shape[0]):
>   #Compute an histogram along the columns
>   # Gets counts and bounds
>self.countsC[i], self.boundsC[i] = np.histogram(data[i],
> bins=self.bins)
>for i in range(self.data.shape[1]):
># Do the same, along the rows.
>self.countsR[i], self.boundsR[i] = np.histogram(data[:,i],
> bins=self.bins)
>
> And data.shape is (4000,4000).
>
> If histogram  had an axis parameter, I could avoid the loop and I guess it
> would be faster.
>
Well I guess, for a slight performance improvement, you could create your
own streamlined histogrammer.

But, in order to better grasp your situation it would be beneficial to know
how the counts and bounds are used later on. Just wondering if this kind
massive histogramming could be somehow avoided totally.

Regards,
eat

>
> Éric.
> > So it seems that you give your array directly to histogramdd (asking a
> > 4000D histogram!). Surely that's not what you are trying to achieve. Can
> > you elaborate more on your objectives? Perhaps some code (slow but
> > working) to demonstrate the point.
> >
> > Regards,
> > eat
> >
>
> Un clavier azerty en vaut deux
> --
> Éric Depagnee...@depagne.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] np.histogram on arrays.

2011-03-31 Thread eat

Hi,

On Wed, Mar 30, 2011 at 1:44 PM, Éric Depagne  wrote:

> >
> > Well I guess, for a slight performance improvement, you could create your
> > own streamlined histogrammer.
> >
> > But, in order to better grasp your situation it would be beneficial to
> know
> > how the counts and bounds are used later on. Just wondering if this kind
> > massive histogramming could be somehow avoided totally.
> Indeed.
> Here's what I do.
> My images come from CCD, and as such, the zero level in the image is not
> the
> true zero level, but is the true zero + the background noise of each
> pixels.
> By doing the histogram, I plan on detecting what is the most common value
> per
> row.
> Once I have the most common value, I can derive the interval where most of
> the
> values are (the index of the largest occurence is easily obtained by
> sorting
> the counts, and I take a slice [index_max_count,index_max_count+1] in the
> second array given by the histogram).
> Then, I  take the mean value of this interval and I assume it is the value
> of
> the bias for my row.
>
> I do this procedure both on the row and columns as a sanity check.
> And I know this procedure will not work if on any row/column there is a lot
> of
> signal and very little bias. I'll fix that afterwards ;-)
>
Perhaps something along these lines will help you:
from numpy import histogram, linspace, r_

def hist2(a, nob):
bins= linspace(a.min(), a.max(), nob+ 1)
d= linspace(0, bins[-1]* a.shape[0], a.shape[0])[:, None]
b= (a+ d).ravel()
bbins= (bins[:-1]+ d).ravel()
bbins= r_[bbins, bbins[-1]+ 1]
counts, _= histogram(b, bbins)
return counts.reshape(-1, nob), bins

It has two disadvantages 1) needs more memory and 2) "global" bins
(which although should be quite straightforward to enhance if needed).

Regards,
eat

>
> Éric.
>
>
> >
> > Regards,
> > eat
> >
>
> Un clavier azerty en vaut deux
> --
> Éric Depagnee...@depagne.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Need to eliminate a nested loop

2011-05-11 Thread eat

Hi,

On Wed, May 11, 2011 at 8:28 AM, Elfnor  wrote:

>
> Hi
>
> The following code produces the desired result but has a slow triple loop
> iterating over the matrix multiplication.
>
> I'm sure it can be eliminated with a neat indexing trick but I can't figure
> out how.
>
> Any suggestions please?
> -
> import numpy
> #define domain of function
> x = numpy.linspace(-5,5,64)
> y = numpy.linspace(-5,5,64)
> z = numpy.linspace(-5,5,64)
>
> #calculate f at each point in domain
> a = 5.0
> b = 3.0
> c = 2.0
> #ellipsoid
> E = numpy.array([[1/a**2,   0   ,   0  ,  0  ],
>[   0   ,1/b**2 ,   0  ,  0  ],
>[   0   ,   0   ,1/c**2,  0  ],
>[   0   ,   0   ,   0  , -1  ]])
>
> f = numpy.zeros((x.size,y.size,z.size))
>
> for i,xi in enumerate(x):
>for j,yj in enumerate(y):
>for k,zk in enumerate(z):
>X = numpy.array([xi,yj,zk,1])
>f[i,j,k] = numpy.dot(numpy.dot(X.T,E),X)
> ---
>
Something like this:
n= 64
u= np.linspace(-5, 5, n)[None, :]
u0= u.repeat(n** 2)[None, :]

u1= u.repeat(n)[None, :].repeat(n, axis= 0).reshape(1, -1)

u2= u.repeat(n** 2, axis= 0).reshape(1, -1)

U= np.r_[u0, u1, u2, np.ones((1, n** 3))]

f= (np.dot(E, U)* U).sum(0).reshape(n, n, n)


Regards,eat

>
> Thanks Eleanor
> --
> View this message in context:
> http://old.nabble.com/Need-to-eliminate-a-nested-loop-tp31591457p31591457.html
> Sent from the Numpy-discussion mailing list archive at Nabble.com.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] finding elements that match any in a set

2011-05-29 Thread eat

On Sun, May 29, 2011 at 10:58 PM, Chris Barker wrote:

> > On 5/28/2011 3:40 PM, Robert wrote:
> >> (myarray in mylist) turns into mylist.__contains__(myarray).
> >> Only the list object is ever checked for this method. There is no
> >> paired method myarray.__rcontains__(mylist) so there is nothing that
> >> numpy can override to make this operation do anything different from
> >> what lists normally do,
>
> however, numpy arrays should be able to override "in" be defining their
> own.__contains__ method, so you could do:
>
> something in an_array
>
> and get a useful, vectorized result.
>
> So I thought I'd see what currently happens when I try that:
>
> In [24]: a
> Out[24]: array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
>
> In [25]: 3 in a
> Out[25]: True
>
> So the simple case works just like a list. But what If I want what the
> OP wants:
>
> In [26]: b
> Out[26]: array([3, 6, 4])
>
> In [27]: b in a
> Out[27]: False
>
> OK, so the full b array is not in a, and it doesn't "vectorize" it,
> either. But:
>
> In [29]: a
> Out[29]:
> array([[ 0,  1,  2],
>[ 3,  4,  5],
>[ 6,  7,  8],
>[ 9, 10, 11]])
>
> In [30]: b in a
> Out[30]: True
>
> HUH?
>
> I'm not sure by what definition we would say that b is contained in a.
>
> but maybe..
>
> In [41]: b
> Out[41]: array([  4,   2, 345])
>
> In [42]: b in a
> Out[42]: False
>
> so it's "are all of the elements in b in a somewhere?" but only for 2-d
> arrays?
>
>
> So what does it mean?
>
FWIW, a short prelude on the theme seems quite promising, indeed:
In []: A
Out[]:
array([[0, 1, 2],
   [3, 4, 5],
   [6, 7, 8]])
In []: [0, 1, 2] in A
Out[]: True
In []: [0, 3, 6] in A
Out[]: True
In []: [0, 4, 8] in A
Out[]: True
In []: [8, 4, 0] in A
Out[]: True
In []: [2, 4, 6] in A
Out[]: True
In []: [6, 4, 2] in A
Out[]: True
In []: [3, 1, 5] in A
Out[]: True
In [1061]: [3, 1, 4] in A
Out[1061]: True
But
In []: [1, 2, 3] in A
Out[]: False
In []: [3, 2, 1] in A
Out[]: True

So, obviously the logic behind __contains__ is not so very straightforward.
Perhaps just a bug?

Regards,
eat

>
> The docstring is not helpful:
>
> In [58]: np.ndarray.__contains__?
> Type:   wrapper_descriptor
> Base Class: 
> String Form:
> Namespace:  Interactive
> Docstring:
> x.__contains__(y) <==> y in x
>
>
> If nothing useful, maybe it could provide a vectorized version of "in"
> for this sort of use case.
>
> -Chris
>
>
>
>
>
>
> --
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
>
> chris.bar...@noaa.gov
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] 3d ODR

2011-06-16 Thread eat

Hi,

On Thu, Jun 16, 2011 at 2:28 PM, Christian K.  wrote:

> Hi,
>
> I need to do fit a 3d surface to a point cloud. This sounds like a job for
> 3d
> orthogonal distance regression. Does anybody know of an implementation?
>
How bout 
http://docs.scipy.org/doc/scipy/reference/odr.htm<http://docs.scipy.org/doc/scipy/reference/odr.html>
My 2 cents,
eat

>
> Regards, Christian K.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] argmax for top N elements

2011-06-22 Thread eat

Hi,

On Wed, Jun 22, 2011 at 6:30 PM, Alex Flint  wrote:

> Is it possible to use argmax or something similar to find the locations of
> the largest N elements in a matrix?

How bout argsort(.)?, Like:
In []: a= arange(9)
In []: a.argsort()[::-1][:3]
Out[]: array([8, 7, 6])

My 2 cents,
eat

>

___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] missing data discussion round 2

2011-06-27 Thread eat

On Mon, Jun 27, 2011 at 8:18 PM, Matthew Brett wrote:

> Hi,
>
> On Mon, Jun 27, 2011 at 5:53 PM, Charles R Harris
>  wrote:
> >
> >
> > On Mon, Jun 27, 2011 at 9:55 AM, Mark Wiebe  wrote:
> >>
> >> First I'd like to thank everyone for all the feedback you're providing,
> >> clearly this is an important topic to many people, and the discussion
> has
> >> helped clarify the ideas for me. I've renamed and updated the NEP, then
> >> placed it into the master NumPy repository so it has a more permanent
> home
> >> here:
> >> https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst
> >> In the NEP, I've tried to address everything that was raised in the
> >> original thread and in Nathaniel's followup 'Concepts' thread. To deal
> with
> >> the issue of whether a mask is True or False for a missing value, I've
> >> removed the 'mask' attribute entirely, except for ufunc-like functions
> >> np.ismissing and np.isavail which return the two styles of masks. Here's
> a
> >> high level summary of how I'm thinking of the topic, and what I will
> >> implement:
> >> Missing Data Abstraction
> >> There appear to be two useful ways to think about missing data that are
> >> worth supporting.
> >> 1) Unknown yet existing data
> >> 2) Data that doesn't exist
> >> In 1), an NA value causes outputs to become NA except in a small number
> of
> >> exceptions such as boolean logic, and in 2), operations treat the data
> as if
> >> there were a smaller array without the NA values.
> >> Temporarily Ignoring Data
> >> In some cases, it is useful to flag data as NA temporarily, possibly in
> >> several different ways, for particular calculations or testing out
> different
> >> ways of throwing away outliers. This is independent of the missing data
> >> abstraction, still requiring a choice of 1) or 2) above.
> >> Implementation Techniques
> >> There are two mechanisms generally used to implement missing data
> >> abstractions,
> >> 1) An NA bit pattern
> >> 2) A mask
> >> I've described a design in the NEP which can include both techniques
> using
> >> the same interface. The mask approach is strictly more general than the
> NA
> >> bit pattern approach, except for a few things like the idea of
> supporting
> >> the dtype 'NA[f8,InfNan]' which you can read about in the NEP.
> >> My intention is to implement the mask-based design, and possibly also
> >> implement the NA bit pattern design, but if anything gets cut it will be
> the
> >> NA bit patterns.
> >
> > I have the impression that the mask-based design would be easier. Perhaps
> > you could do that one first and folks could try out the API and see how
> they
> > like it and discover whether the memory overhead is a problem in
> practice.
>
> That seems like a risky strategy to me, as the most likely outcome is
> that people worried about memory will avoid masked arrays because they
> know they use more memory.  The memory usage is predictable and we
> won't learn any more about it from use.  We most of us already know if
> we're having to optimize code for memory.
>
> You won't get complaints, you'll just lose a group of users, who will,
> I suspect, stick to NaNs, unsatisfactory as they are.
>
+1

- eat

>
> See you,
>
> Matthew
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] missing data discussion round 2

2011-06-27 Thread eat

Hi,

On Mon, Jun 27, 2011 at 6:55 PM, Mark Wiebe  wrote:

> First I'd like to thank everyone for all the feedback you're providing,
> clearly this is an important topic to many people, and the discussion has
> helped clarify the ideas for me. I've renamed and updated the NEP, then
> placed it into the master NumPy repository so it has a more permanent home
> here:
>
> https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst
>
> In the NEP, I've tried to address everything that was raised in the
> original thread and in Nathaniel's followup 'Concepts' thread. To deal with
> the issue of whether a mask is True or False for a missing value, I've
> removed the 'mask' attribute entirely, except for ufunc-like functions
> np.ismissing and np.isavail which return the two styles of masks. Here's a
> high level summary of how I'm thinking of the topic, and what I will
> implement:
>
> *Missing Data Abstraction*
>
> There appear to be two useful ways to think about missing data that are
> worth supporting.
>
> 1) Unknown yet existing data
> 2) Data that doesn't exist
>
> In 1), an NA value causes outputs to become NA except in a small number of
> exceptions such as boolean logic, and in 2), operations treat the data as if
> there were a smaller array without the NA values.
>
> *Temporarily Ignoring Data*
> *
> *
> In some cases, it is useful to flag data as NA temporarily, possibly in
> several different ways, for particular calculations or testing out different
> ways of throwing away outliers. This is independent of the missing data
> abstraction, still requiring a choice of 1) or 2) above.
>
> *Implementation Techniques*
> *
> *
> There are two mechanisms generally used to implement missing data
> abstractions,
> *
> *
> 1) An NA bit pattern
> 2) A mask
>
> I've described a design in the NEP which can include both techniques using
> the same interface. The mask approach is strictly more general than the NA
> bit pattern approach, except for a few things like the idea of supporting
> the dtype 'NA[f8,InfNan]' which you can read about in the NEP.
>
> My intention is to implement the mask-based design, and possibly also
> implement the NA bit pattern design, but if anything gets cut it will be the
> NA bit patterns.
>
> Thanks again for all your input so far, and thanks in advance for your
> suggestions for improving this new revision of the NEP.
>
A very impressive PEP indeed.

However, how would corner cases, like

>>> a = np.array([np.NA, np.NA], dtype='f8', masked=True)
>>> np.mean(a, skipna=True)

>>> np.mean(a)

be handled?

My concern here is that there always seems to be such corner cases which can
only be handled with specific context knowledge. Thus producing 100% generic
code to handle 'missing data' is not doable.


Thanks,
- eat

>
> -Mark
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] missing data discussion round 2

2011-06-27 Thread eat

On Mon, Jun 27, 2011 at 8:53 PM, Mark Wiebe  wrote:

> On Mon, Jun 27, 2011 at 12:44 PM, eat  wrote:
>
>> Hi,
>>
>> On Mon, Jun 27, 2011 at 6:55 PM, Mark Wiebe  wrote:
>>
>>> First I'd like to thank everyone for all the feedback you're providing,
>>> clearly this is an important topic to many people, and the discussion has
>>> helped clarify the ideas for me. I've renamed and updated the NEP, then
>>> placed it into the master NumPy repository so it has a more permanent home
>>> here:
>>>
>>> https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst
>>>
>>> In the NEP, I've tried to address everything that was raised in the
>>> original thread and in Nathaniel's followup 'Concepts' thread. To deal with
>>> the issue of whether a mask is True or False for a missing value, I've
>>> removed the 'mask' attribute entirely, except for ufunc-like functions
>>> np.ismissing and np.isavail which return the two styles of masks. Here's a
>>> high level summary of how I'm thinking of the topic, and what I will
>>> implement:
>>>
>>> *Missing Data Abstraction*
>>>
>>> There appear to be two useful ways to think about missing data that are
>>> worth supporting.
>>>
>>> 1) Unknown yet existing data
>>> 2) Data that doesn't exist
>>>
>>> In 1), an NA value causes outputs to become NA except in a small number
>>> of exceptions such as boolean logic, and in 2), operations treat the data as
>>> if there were a smaller array without the NA values.
>>>
>>> *Temporarily Ignoring Data*
>>> *
>>> *
>>> In some cases, it is useful to flag data as NA temporarily, possibly in
>>> several different ways, for particular calculations or testing out different
>>> ways of throwing away outliers. This is independent of the missing data
>>> abstraction, still requiring a choice of 1) or 2) above.
>>>
>>> *Implementation Techniques*
>>> *
>>> *
>>> There are two mechanisms generally used to implement missing data
>>> abstractions,
>>> *
>>> *
>>> 1) An NA bit pattern
>>> 2) A mask
>>>
>>> I've described a design in the NEP which can include both techniques
>>> using the same interface. The mask approach is strictly more general than
>>> the NA bit pattern approach, except for a few things like the idea of
>>> supporting the dtype 'NA[f8,InfNan]' which you can read about in the NEP.
>>>
>>> My intention is to implement the mask-based design, and possibly also
>>> implement the NA bit pattern design, but if anything gets cut it will be the
>>> NA bit patterns.
>>>
>>> Thanks again for all your input so far, and thanks in advance for your
>>> suggestions for improving this new revision of the NEP.
>>>
>> A very impressive PEP indeed.
>>
> Hi,

>
>> However, how would corner cases, like
>>
>> >>> a = np.array([np.NA, np.NA], dtype='f8', masked=True)
>> >>> np.mean(a, skipna=True)
>>
>> This should be equivalent to removing all the NA values, then calling
> mean, like this:
>
> >>> b = np.array([], dtype='f8')
> >>> np.mean(b)
> /home/mwiebe/virtualenvs/dev/lib/python2.7/site-packages/numpy/core/fromnumeric.py:2374:
> RuntimeWarning: invalid value encountered in double_scalars
>   return mean(axis, dtype, out)
> nan
>
> >>> np.mean(a)
>>
>> This would return NA, since NA values are sitting in positions that would
> affect the output result.
>
OK.

>
>
>> be handled?
>>
>> My concern here is that there always seems to be such corner cases which
>> can only be handled with specific context knowledge. Thus producing 100%
>> generic code to handle 'missing data' is not doable.
>>
>
> Working out the corner cases for the functions that are already in numpy
> seems tractable to me, how to or whether to support missing data is
> something the author of each new function will have to consider when missing
> data support is in NumPy, but I don't think we can do more than provide the
> mechanisms for people to use.
>
Sure. I'll ride up with this and wait when I'll have some tangible to
outperform the 'traditional' NaN handling.

- eat

>
> -Mark
>
>
>> Thanks,
>> - eat
>>
>>>
>>> -Mark
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] SVD does not converge

2011-06-28 Thread eat

Hi,

On Tue, Jun 28, 2011 at 7:43 PM, Lou Pecora  wrote:

>
> --
> *From:* santhu kumar 
> *To:* numpy-discussion@scipy.org
> *Sent:* Tue, June 28, 2011 11:56:48 AM
> *Subject:* [Numpy-discussion] SVD does not converge
>
> Hello,
>
> I have a 380X5 matrix and when I am calculating pseudo-inverse of the
> matrix using pinv(numpy.linalg) I get the following error message:
>
> raise LinAlgError, 'SVD did not converge'
> numpy.linalg.linalg.LinAlgError: SVD did not converge
>
> I have looked in the list that it is a recurring issue but I was unable to
> find any solution. Can somebody please guide me on how to fix that issue?
>
> Thanks
> Santhosh
> ==
>
> I had a similar problem (although I wasn't looking for the pseudo inverse).
>  I found that "squaring" the matrix fixed the problem.  But I'm guessing in
> your situation that would mean a 380 x 380 matrix (I hope I'm thinking about
> your case correctly).  But it's worth trying since it's easy to do.
>
With my rusty linear algebra: if one chooses to proceed with this 'squaring'
avenue, wouldn't it then be more economical to base the calculations on a
square 5x5 matrix? Something like:
A_pinv= dot(A, pinv(dot(A.T, A))).T
Instead of a 380x380 based matrix:
A_pinv= dot(pinv(dot(A, A.T)), A).T

My two cents
- eat

>
> -- Lou Pecora, my views are my own.
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] missing data discussion round 2

2011-06-28 Thread eat

Hi,

On Wed, Jun 29, 2011 at 1:40 AM, Jason Grout wrote:

> On 6/28/11 5:20 PM, Matthew Brett wrote:
> > Hi,
> >
> > On Tue, Jun 28, 2011 at 4:06 PM, Nathaniel Smith  wrote:
> > ...
> >> (You might think, what difference does it make if you *can* unmask an
> >> item? Us missing data folks could just ignore this feature. But:
> >> whatever we end up implementing is something that I will have to
> >> explain over and over to different people, most of them not
> >> particularly sophisticated programmers. And there's just no sensible
> >> way to explain this idea that if you store some particular value, then
> >> it replaces the old value, but if you store NA, then the old value is
> >> still there.
> >
> > Ouch - yes.  No question, that is difficult to explain.   Well, I
> > think the explanation might go like this:
> >
> > "Ah, yes, well, that's because in fact numpy records missing values by
> > using a 'mask'.   So when you say `a[3] = np.NA', what you mean is,
> > 'a._mask = np.ones(a.shape, np.dtype(bool); a._mask[3] = False`"
> >
> > Is that fair?
>
> Maybe instead of np.NA, we could say np.IGNORE, which sort of conveys
> the idea that the entry is still there, but we're just ignoring it.  Of
> course, that goes against common convention, but it might be easier to
> explain.
>
Somehow very similar approach how I always have treated the NaNs.
(Thus postponing all the real (slightly dirty) work  on to the imputation
procedures).

For me it has been sufficient to ignore what's the actual cause of NaNs. But
I believe there exists plenty other much more sophisticated situations where
this  kind of simple treatment is not sufficient, at all. Anyway, even in
the future it should still be possible to play nicely with these kind of
simple scenarios.

- eat

>
> Thanks,
>
> Jason
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Array vectorization in numpy

2011-07-20 Thread eat

Hi,

On Wed, Jul 20, 2011 at 2:42 AM, Chad Netzer  wrote:

> On Tue, Jul 19, 2011 at 6:10 PM, Pauli Virtanen  wrote:
>
> >k = m - 0.5
> >
> > does here the same thing as
> >
> >k = np.empty_like(m)
> >np.subtract(m, 0.5, out=k)
> >
> > The memory allocation (empty_like and the subsequent deallocation)
> > costs essentially nothing, and there are no temporaries or copying
> > in `subtract`.
>
> As verification:
>
> >>> import timeit
> >>> import numpy as np
> >>> t=timeit.Timer('k = m - 0.5', setup='import numpy as np;m =
> np.ones([8092,8092],float)')
> >>> np.mean(t.repeat(repeat=10, number=1))
> 0.53904647827148433
>
> >>> t=timeit.Timer('k = np.empty_like(m);np.subtract(m, 0.5, out=k)',
> setup='import numpy as np;m = np.ones([8092,8092],float)')
> >>> np.mean(t.repeat(repeat=10, number=1))
> 0.54006035327911373
>
> The trivial difference is expected as extra python parsing overhead, I
> think.
>
> Which leads me to apologize, since in my previous post I clearly meant
> to type "m -= 0.5", not "m =- 0.5", which is *quite* a different
> operation...  Carlos, and Lutz, please take heed. :)  In fact, as Lutz
> pointed out, that example was not at all what I intended to show
> anyway.
>
>
> So, just to demonstrate how it was wrong

Perhaps slightly OT, but here is something very odd going on. I would expect
the performance to be in totally different ballpark.

> >>> t=timeit.Timer('m =- 0.5', setup='import numpy as np;m =
> np.ones([8092,8092],float)')
> >>> np.mean(t.repeat(repeat=10, number=1))
> 0.058299207687377931
>
More like:
In []: %timeit m =- .5
1000 loops, best of 3: 35 ns per loop

-eat

>
> >>> t=timeit.Timer('m -= 0.5', setup='import numpy as np;m =
> np.ones([8092,8092],float)')
> >>> np.mean(t.repeat(repeat=10, number=1))
> 0.28192551136016847
>
> >>> t=timeit.Timer('np.subtract(m, 0.5, m)', setup='import numpy as np;m =
> np.ones([8092,8092],float)')
> >>> np.mean(t.repeat(repeat=10, number=1))
> 0.27014491558074949
>
> >>> t=timeit.Timer('np.subtract(m, 0.5, k)', setup='import numpy as np;m =
> np.ones([8092,8092],float); k = np.empty_like(m)')
> >>> np.mean(t.repeat(repeat=10, number=1))
> 0.54962997436523442
>
> Perhaps the difference in the last two simply comes down to cache
> effects (having to iterate over two different large memory blocks,
> rather than one)?
>
> -Chad
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Rationale for returning type-wrapped min() / max() scalars? (was: Problem with ufunc of a numpy.ndarray derived class)

2011-07-31 Thread eat

Hi,

On Sun, Jul 31, 2011 at 7:36 PM, Charles R Harris  wrote:

>
>
> On Sun, Jul 31, 2011 at 12:50 AM, Hans Meine <
> me...@informatik.uni-hamburg.de> wrote:
>
>> Am 29.07.2011 um 20:23 schrieb Nathaniel Smith:
>> > Even so, surely this behavior should be consistent between base class
>> > ndarrays and subclasses? If returning 0d arrays is a good idea, then
>> > we should do it everywhere. If it's a bad idea, then we shouldn't do
>> > it at all...?
>>
>> Very well put.  That's exactly the reason why I am insisting on this
>> discussion, and why I believe that the behavior change is not intentional.
>>  Otherwise, ndarray and matrix should behave like my subclass.  (BTW: I did
>> not check masked_array yet.)
>>
>> > (In reality, it sounds like this might be some mishap in the
>> > __array_wrap__ mechanism?)
>>
>> That's exactly my guess.  (That could also explain why Mark did not see
>> anything obvious in the code.)
>>
>>
> Maybe. There isn't a problem for plain old zero dimensional arrays.
>
> In [1]: a = array(1)
>
> In [2]: a.dtype
> Out[2]: dtype('int64')
>
> In [3]: reshape(a, (1,1), order='f')
> Out[3]: array([[1]])
>
FWIW:
In []: sys.version
Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]'
In []: np.version.version
Out[]: '1.6.0'

In []: a= array(1)

In []: a.reshape((1, 1), order= 'F').flags
Out[]:
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

In []: a.reshape((1, 1), order= 'C').flags
Out[]:
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

Seems to be slightly inconsistent, but does it really matter?

-eat

>
> This on Linux 64 with latest master.
>
> Chuck
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Fill a particular value in the place of number satisfying certain condition by another number in an array.

2011-08-01 Thread eat

Hi

On Mon, Aug 1, 2011 at 3:14 PM, Jeffrey Spencer wrote:

>  Depends where it is contained but another option is and I find it to
> typically be faster:
>
> B = zeros(A.shape)
> maximum(A,B,A)
>
Since maximum(.) can handle broadcasting
maximum(A, 0, A)
will be even faster.

-eat

>
>
> On 08/01/2011 07:31 PM, dileep kunjaai wrote:
>
> Dear sir,
>How can we fill a particular value in the place of number satisfying
> certain condition by another number in an array.
>
>
> Example:
>  A=[[[  9.42233087e-42  - 4.71116544e-42   0.e+00 ...,
> 1.48303127e+01
>  1.31524124e+01   1.14745111e+01]
>   [  3.91788793e+00   1.95894396e+00   0.e+00 ...,   1.78252487e+01
>  1.28667984e+01   7.90834856e+00]
>   [  7.83592510e+00   -3.91796255e+00   0.e+00 ...,
> 2.08202991e+01
>  1.25811749e+01   4.34205008e+00]
>   ...,
>   [  -8.51249974e-03   7.00901222e+00   -1.40095119e+01 ...,
> 0.e+00
>  0.e+00   0.e+00]
>   [  4.26390441e-03   3.51080871e+00   -7.01735353e+00 ...,
> 0.e+00
>  0.e+00   0.e+00]
>   [  0.e+00   0.e+00   0.e+00 ...,   0.e+00
>  0.e+00   0.e+00]]
>
>  [[  9.42233087e-42   -4.71116544e-42   0.e+00 ...,
> 8.48242474e+00
>  7.97146845e+00   7.46051216e+00]
>   [  5.16325808e+00   2.58162904e+00   0.e+00 ...,   8.47719383e+00
>  8.28024673e+00   8.08330059e+00]
>   [  1.03267126e+01   5.16335630e+00   0.e+00 ...,   8.47196198e+00
>  8.58903694e+00   8.70611191e+00]
>   ...,
>   [  0.e+00   2.74500012e-01   5.4925e-01 ...,   0.e+00
>  0.e+00   0.e+00]
>   [  0.e+00   1.37496844e-01   -2.74993688e-01 ...,
> 0.e+00
>  0.e+00   0.e+00]
>   [  0.e+00   0.e+00   0.e+00 ...,   0.e+00
>  0.e+00   0.e+00]]
>
>  [[  9.42233087e-42   4.71116544e-42   0.e+00 ...,   1.18437748e+01
>  9.72778034e+00   7.61178637e+00]
>   [  2.96431869e-01   1.48215935e-01   0.e+00 ...,   1.64031239e+01
>  1.32768812e+01   1.01506386e+01]
>   [  5.92875004e-01   2.96437502e-01   0.e+00 ...,   2.09626484e+01
>  1.68261185e+01   1.26895866e+01]
>   ...,
>   [  1.78188753e+00   -8.90943766e-01   0.e+00 ...,
> 0.e+00
>  1.2755e-03   2.5509e-03]
>   [  9.34620261e-01   -4.67310131e-01   0.e+00 ...,
> 0.e+00
>  6.38646539e-04   1.27729308e-03]
>   [  8.4339e-02   4.21500020e-02   0.e+00 ...,   0.e+00
>  0.e+00   0.e+00]]]
>   A contain some negative value i want to change the negative numbers to
> '0'.
> I used 'masked_where', command but I failed.
>
>
>
> Please help me
>
> --
> DILEEPKUMAR. R
> J R F, IIT DELHI
>
>
>
> ___
> NumPy-Discussion mailing 
> listNumPy-Discussion@scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Example Usage of Neighborhood Iterator in Cython

2011-10-17 Thread eat

Hi,

On Mon, Oct 17, 2011 at 9:19 PM, T J  wrote:

> I recently put together a Cython example which uses the neighborhood
> iterator.  It was trickier than I thought it would be, so I thought to
> share it with the community.  The function takes a 1-dimensional array
> and returns a 2-dimensional array of neighborhoods in the original
> area. This is somewhat similar to the functionality provided by
> segment_axis (http://projects.scipy.org/numpy/ticket/901), but I
> believe this slightly different in that neighborhood can extend to the
> left of the current index (assuming circular boundaries).  Keep in
> mind that this is just an example, and normal usage probably is not
> concerned with creating a new array.
>
> External link:  http://codepad.org/czRIzXQl
>
> --
>
> import numpy as np
> cimport numpy as np
>
> cdef extern from "numpy/arrayobject.h":
>
>ctypedef extern class numpy.flatiter [object PyArrayIterObject]:
>cdef int nd_m1
>cdef np.npy_intp index, size
>cdef np.ndarray ao
>cdef char *dataptr
>
># This isn't exposed to the Python API.
># So we can't use the same approach we used to define flatiter
>ctypedef struct PyArrayNeighborhoodIterObject:
>int nd_m1
>np.npy_intp index, size
>np.PyArrayObject *ao # note the change from np.ndarray
>char *dataptr
>
>object PyArray_NeighborhoodIterNew(flatiter it, np.npy_intp* bounds,
>   int mode, np.ndarray fill_value)
>int PyArrayNeighborhoodIter_Next(PyArrayNeighborhoodIterObject *it)
>int PyArrayNeighborhoodIter_Reset(PyArrayNeighborhoodIterObject *it)
>
>object PyArray_IterNew(object arr)
>void PyArray_ITER_NEXT(flatiter it)
>np.npy_intp PyArray_SIZE(np.ndarray arr)
>
>cdef enum:
>NPY_NEIGHBORHOOD_ITER_ZERO_PADDING,
>NPY_NEIGHBORHOOD_ITER_ONE_PADDING,
>NPY_NEIGHBORHOOD_ITER_CONSTANT_PADDING,
>NPY_NEIGHBORHOOD_ITER_CIRCULAR_PADDING,
>NPY_NEIGHBORHOOD_ITER_MIRROR_PADDING
>
> np.import_array()
>
> def windowed(np.ndarray[np.int_t, ndim=1] arr, bounds):
>
>cdef flatiter iterx = PyArray_IterNew(arr)
>cdef np.npy_intp size = PyArray_SIZE(arr)
>cdef np.npy_intp* boundsPtr = [bounds[0], bounds[1]]
>cdef int hoodSize = bounds[1] - bounds[0] + 1
>
># Create the Python object and keep a reference to it
>cdef object niterx_ = PyArray_NeighborhoodIterNew(iterx,
>boundsPtr, NPY_NEIGHBORHOOD_ITER_CIRCULAR_PADDING, None)
>cdef PyArrayNeighborhoodIterObject *niterx = \
>niterx_
>
>cdef int i,j
>cdef np.ndarray[np.int_t, ndim=2] hoods
>
>hoods = np.empty((arr.shape[0], hoodSize), dtype=np.int)
>for i in range(iterx.size):
>for j in range(niterx.size):
>hoods[i,j] = (niterx.dataptr)[0]
>PyArrayNeighborhoodIter_Next(niterx)
>PyArray_ITER_NEXT(iterx)
>PyArrayNeighborhoodIter_Reset(niterx)
>return hoods
>
> def test():
>x = np.arange(10)
>print x
>print
>print windowed(x, [-1, 3])
>print
>print windowed(x, [-2, 2])
>
>
> --
>
> If you run test(), this is what you should see:
>
> [0 1 2 3 4 5 6 7 8 9]
>
> [[9 0 1 2 3]
>  [0 1 2 3 4]
>  [1 2 3 4 5]
>  [2 3 4 5 6]
>  [3 4 5 6 7]
>  [4 5 6 7 8]
>  [5 6 7 8 9]
>  [6 7 8 9 0]
>  [7 8 9 0 1]
>  [8 9 0 1 2]]
>
> [[8 9 0 1 2]
>  [9 0 1 2 3]
>  [0 1 2 3 4]
>  [1 2 3 4 5]
>  [2 3 4 5 6]
>  [3 4 5 6 7]
>  [4 5 6 7 8]
>  [5 6 7 8 9]
>  [6 7 8 9 0]
>  [7 8 9 0 1]]
>
> windowed(x, [0, 2]) is almost like segment_axis(x, 3, 2, end='wrap').
>
Just wondering what are the main benefits, of your approach, comparing to
simple:
In []: a= arange(5)
In []: n= 10
In []: b= arange(n)[:, None]
In []: mod(a+ roll(b, 1), n)
Out[]:
array([[9, 0, 1, 2, 3],
   [0, 1, 2, 3, 4],
   [1, 2, 3, 4, 5],
   [2, 3, 4, 5, 6],
   [3, 4, 5, 6, 7],
   [4, 5, 6, 7, 8],
   [5, 6, 7, 8, 9],
   [6, 7, 8, 9, 0],
   [7, 8, 9, 0, 1],
   [8, 9, 0, 1, 2]])
In []: mod(a+ roll(b, 2), n)
Out[]:
array([[8, 9, 0, 1, 2],
   [9, 0, 1, 2, 3],
   [0, 1, 2, 3, 4],
   [1, 2, 3, 4, 5],
   [2, 3, 4, 5, 6],
   [3, 4, 5, 6, 7],
   [4, 5, 6, 7, 8],
   [5, 6, 7, 8, 9],
   [6, 7, 8, 9, 0],
   [7, 8, 9, 0, 1]])

Regards,
eat

> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] build numpy matrix out of smaller matrix

2011-12-01 Thread eat

Hi,

On Thu, Dec 1, 2011 at 6:52 PM, jonasr  wrote:

>
> Hi,
> is there any possibility to define a numpy matrix, via a smaller given
> matrix, i.e. in matlab
> i can do this like
>
> a=[1 2 ; 3 4 ]
>
>
> A=[a a ; a a ]
>
> so that i finally get
>
> A=[ [1,2,1,2]
>  [3,4,3,4]
>  [1,2,1,2]
>  [3,4,3,4]]
>
> i tried different things on numpy which didn't work
> any ideas ?
>
Perhaps something like this:
In []: a= np.array([[1, 2], [3, 4]])
In []: np.c_[[a, a], [a, a]]
Out[]:
array([[[1, 2, 1, 2],
[3, 4, 3, 4]],
   [[1, 2, 1, 2],
[3, 4, 3, 4]]])

Regards,
eat

>
> thank you
>
>
> --
> View this message in context:
> http://old.nabble.com/build-numpy-matrix-out-of-smaller-matrix-tp32896004p32896004.html
> Sent from the Numpy-discussion mailing list archive at Nabble.com.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] build numpy matrix out of smaller matrix

2011-12-01 Thread eat

Oops, slightly incorrect answer, but anyway my intention was more along the
lines:
In []: a= np.array([[1, 2], [3, 4]])
In []: np.c_[[a, a], [a, a]].reshape(4, 4)
Out[]:
array([[1, 2, 1, 2],
   [3, 4, 3, 4],
   [1, 2, 1, 2],
   [3, 4, 3, 4]])

On Thu, Dec 1, 2011 at 8:16 PM,  wrote:

> On Thu, Dec 1, 2011 at 12:26 PM, Benjamin Root  wrote:
> > On Thu, Dec 1, 2011 at 10:52 AM, jonasr  wrote:
> >>
> >>
> >> Hi,
> >> is there any possibility to define a numpy matrix, via a smaller given
> >> matrix, i.e. in matlab
> >> i can do this like
> >>
> >> a=[1 2 ; 3 4 ]
> >>
> >>
> >> A=[a a ; a a ]
> >>
> >> so that i finally get
> >>
> >> A=[ [1,2,1,2]
> >>  [3,4,3,4]
> >>  [1,2,1,2]
> >>  [3,4,3,4]]
> >>
> >> i tried different things on numpy which didn't work
> >> any ideas ?
> >>
> >> thank you
> >>
> >
> > numpy.tile() might be what you are looking for.
>
> or which is my favorite tile and repeat replacement
>
> >>> a= np.array([[1, 2], [3, 4]])
> >>> np.kron(np.ones((2,2)), a)
> array([[ 1.,  2.,  1.,  2.],
>   [ 3.,  4.,  3.,  4.],
>   [ 1.,  2.,  1.,  2.],
>   [ 3.,  4.,  3.,  4.]])
>
> >>> np.kron(a, np.ones((2,2)))
> array([[ 1.,  1.,  2.,  2.],
>   [ 1.,  1.,  2.,  2.],
>   [ 3.,  3.,  4.,  4.],
>   [ 3.,  3.,  4.,  4.]])
>
But, of'course this is way more generic (and preferable) approach to
utilize.

eat

>
> Josef
>
> >
> > Cheers!
> > Ben Root
> >
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Would it be possible to enhance np.unique(.) with a keyword kind

2011-12-19 Thread eat

Hi,

Especially when the keyword return_index of np.unique(.) is specified to be
True, would it in general also be reasonable to be able to specify the
sorting algorithm of the underlying np.argsort(.)?

The rationale is that (for at least some of my cases) higher level
algorithms seems to be too over complicated unless I'm not able to request
a stable sorting order from np.unique(.) (like np.unique(., return_index=
True, kind= 'mergesort').

(FWIW, I apparently do have a working local hack for this kind of
functionality, but without extensive testing of 'all' corner cases).


Thanks,
eat
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Would it be possible to enhance np.unique(.) with a keyword kind

2011-12-19 Thread eat

Hi,

On Tue, Dec 20, 2011 at 2:33 AM,  wrote:

> On Mon, Dec 19, 2011 at 6:27 PM, eat  wrote:
> > Hi,
> >
> > Especially when the keyword return_index of np.unique(.) is specified to
> be
> > True, would it in general also be reasonable to be able to specify the
> > sorting algorithm of the underlying np.argsort(.)?
> >
> > The rationale is that (for at least some of my cases) higher level
> > algorithms seems to be too over complicated unless I'm not able to
> request a
> > stable sorting order from np.unique(.) (like np.unique(., return_index=
> > True, kind= 'mergesort').
> >
> > (FWIW, I apparently do have a working local hack for this kind of
> > functionality, but without extensive testing of 'all' corner cases).
>
> Just to understand this:
>
> Is the return_index currently always the first occurrence or random?
>
No, for current implementation it's not always the first occurrence
returned. AFAIK, the only stable algorithm to provide this is ' mergesort'
and that's why I'll like to have a keyword 'kind' to propagate down to then
internals.

>
> I haven't found a use for return_index yet (but use return_inverse a
> lot), but if we are guaranteed to get the first instance, then this
> could be very useful.
>
I think that 'return_inverse' will suffer of the same
nondeterministic behavior as well.

Thanks,
eat

>
> Josef
>
>
> >
> >
> > Thanks,
> > eat
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Would it be possible to enhance np.unique(.) with a keyword kind

2011-12-19 Thread eat

Hi,

On Tue, Dec 20, 2011 at 3:41 AM,  wrote:

> On Mon, Dec 19, 2011 at 8:16 PM, eat  wrote:
> > Hi,
> >
> > On Tue, Dec 20, 2011 at 2:33 AM,  wrote:
> >>
> >> On Mon, Dec 19, 2011 at 6:27 PM, eat  wrote:
> >> > Hi,
> >> >
> >> > Especially when the keyword return_index of np.unique(.) is specified
> to
> >> > be
> >> > True, would it in general also be reasonable to be able to specify the
> >> > sorting algorithm of the underlying np.argsort(.)?
> >> >
> >> > The rationale is that (for at least some of my cases) higher level
> >> > algorithms seems to be too over complicated unless I'm not able to
> >> > request a
> >> > stable sorting order from np.unique(.)
> (like np.unique(., return_index=
> >> > True, kind= 'mergesort').
> >> >
> >> > (FWIW, I apparently do have a working local hack for this kind of
> >> > functionality, but without extensive testing of 'all' corner cases).
> >>
> >> Just to understand this:
> >>
> >> Is the return_index currently always the first occurrence or random?
> >
> > No, for current implementation it's not always the first occurrence
> > returned. AFAIK, the only stable algorithm to provide this is
> ' mergesort'
> > and that's why I'll like to have a keyword 'kind' to propagate down to
> then
> > internals.
>
> Thanks, then I'm all in favor of mergesort.
>
> >>
> >>
> >> I haven't found a use for return_index yet (but use return_inverse a
> >> lot), but if we are guaranteed to get the first instance, then this
> >> could be very useful.
> >
> > I think that 'return_inverse' will suffer of the same
> > nondeterministic behavior as well.
>
> I don't think so, because there is a unique mapping from observations
> to unique items.
>
But (source code of 1.6.1) indicates that keywords 'return_inverse' are
'return_index' are related, indeed.

Just my 2 cents
eat

>
> Josef
>
> >
> > Thanks,
> > eat
> >>
> >>
> >> Josef
> >>
> >>
> >> >
> >> >
> >> > Thanks,
> >> > eat
> >> >
> >> > ___
> >> > NumPy-Discussion mailing list
> >> > NumPy-Discussion@scipy.org
> >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >> >
> >> ___
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion@scipy.org
> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread eat

Hi,

Oddly, but numpy 1.6 seems to behave more consistent manner:

In []: sys.version
Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]'
In []: np.version.version
Out[]: '1.6.0'

In []: d= np.load('data.npy')
In []: d.dtype
Out[]: dtype('float32')

In []: d.mean()
Out[]: 3045.74718
In []: d.mean(dtype= np.float32)
Out[]: 3045.74718
In []: d.mean(dtype= np.float64)
Out[]: 3045.747251076416
In []: (d- d.min()).mean()+ d.min()
Out[]: 3045.7472508750002
In []: d.mean(axis= 0).mean()
Out[]: 3045.74724
In []: d.mean(axis= 1).mean()
Out[]: 3045.74724

Or does the results of calculations depend more on the platform?


My 2 cents,
eat
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread eat

Hi

On Wed, Jan 25, 2012 at 1:21 AM, Kathleen M Tacina <
kathleen.m.tac...@nasa.gov> wrote:

> **
> I found something similar, with a very simple example.
>
> On 64-bit linux, python 2.7.2, numpy development version:
>
> In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32)
>
> In [23]: a.mean()
> Out[23]: 4034.16357421875
>
> In [24]: np.version.full_version
> Out[24]: '2.0.0.dev-55472ca'
>
>
> But, a Windows XP machine running python 2.7.2 with numpy 1.6.1 gives:
> >>>a = np.ones((1024,1024),dtype=np.float32)
> >>>a.mean()
> 4000.0
> >>>np.version.full_version
> '1.6.1'
>
This indeed looks very nasty, regardless of whether it is a version or
platform related problem.

-eat

>
>
>
> On Tue, 2012-01-24 at 17:12 -0600, eat wrote:
>
> Hi,
>
>
>
>  Oddly, but numpy 1.6 seems to behave more consistent manner:
>
>
>
>  In []: sys.version
>
>  Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit
> (Intel)]'
>
>  In []: np.version.version
>
>  Out[]: '1.6.0'
>
>
>
>  In []: d= np.load('data.npy')
>
>  In []: d.dtype
>
>  Out[]: dtype('float32')
>
>
>
>  In []: d.mean()
>
>  Out[]: 3045.74718
>
>  In []: d.mean(dtype= np.float32)
>
>  Out[]: 3045.74718
>
>  In []: d.mean(dtype= np.float64)
>
>  Out[]: 3045.747251076416
>
>  In []: (d- d.min()).mean()+ d.min()
>
>  Out[]: 3045.7472508750002
>
>  In []: d.mean(axis= 0).mean()
>
>  Out[]: 3045.74724
>
>  In []: d.mean(axis= 1).mean()
>
>  Out[]: 3045.74724
>
>
>
>  Or does the results of calculations depend more on the platform?
>
>
>
>
>
>  My 2 cents,
>
>  eat
>
>   --
> --
> Kathleen M. Tacina
> NASA Glenn Research Center
> MS 5-10
> 21000 Brookpark Road
> Cleveland, OH 44135
> Telephone: (216) 433-6660
> Fax: (216) 433-5802
> --
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Unrealistic expectations of class Polynomial or a bug?

2012-01-28 Thread eat

Hi,

Short demonstration of the issue:
In []: sys.version
Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]'
In []: np.version.version
Out[]: '1.6.0'

In []: from numpy.polynomial import Polynomial as Poly
In []: def p_tst(c):
   ..: p= Poly(c)
   ..: r= p.roots()
   ..: return sort(abs(p(r)))
   ..:

Now I would expect a result more like:
In []: p_tst(randn(123))[-3:]
Out[]: array([  3.41987203e-07,   2.82123675e-03,   2.82123675e-03])

be the case, but actually most result seems to be more like:
In []: p_tst(randn(123))[-3:]
Out[]: array([  9.09325898e+13,   9.09325898e+13,   1.29387029e+72])
In []: p_tst(randn(123))[-3:]
Out[]: array([  8.60862087e-11,   8.60862087e-11,   6.58784520e+32])
In []: p_tst(randn(123))[-3:]
Out[]: array([  2.00545673e-09,   3.25537709e+32,   3.25537709e+32])
In []: p_tst(randn(123))[-3:]
Out[]: array([  3.22753481e-04,   1.87056454e+00,   1.87056454e+00])
In []: p_tst(randn(123))[-3:]
Out[]: array([  2.98556327e+08,   2.98556327e+08,   8.23588003e+12])

So, does this phenomena imply that
- I'm testing with too high order polynomials (if so, does there exists a
definite upper limit of polynomial order I'll not face this issue)
or
- it's just the 'nature' of computations with float values (if so, probably
I should be able to tackle this regardless of the polynomial order)
or
- it's a nasty bug in class Polynomial


Regards,
eat
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Unrealistic expectations of class Polynomial or a bug?

2012-01-30 Thread eat

On Sat, Jan 28, 2012 at 11:14 PM, Charles R Harris <
charlesr.har...@gmail.com> wrote:

>
>
> On Sat, Jan 28, 2012 at 11:15 AM, eat  wrote:
>
>> Hi,
>>
>> Short demonstration of the issue:
>> In []: sys.version
>> Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit
>> (Intel)]'
>> In []: np.version.version
>> Out[]: '1.6.0'
>>
>> In []: from numpy.polynomial import Polynomial as Poly
>> In []: def p_tst(c):
>>..: p= Poly(c)
>>..: r= p.roots()
>>..: return sort(abs(p(r)))
>>..:
>>
>> Now I would expect a result more like:
>> In []: p_tst(randn(123))[-3:]
>> Out[]: array([  3.41987203e-07,   2.82123675e-03,   2.82123675e-03])
>>
>> be the case, but actually most result seems to be more like:
>> In []: p_tst(randn(123))[-3:]
>> Out[]: array([  9.09325898e+13,   9.09325898e+13,   1.29387029e+72])
>> In []: p_tst(randn(123))[-3:]
>> Out[]: array([  8.60862087e-11,   8.60862087e-11,   6.58784520e+32])
>> In []: p_tst(randn(123))[-3:]
>> Out[]: array([  2.00545673e-09,   3.25537709e+32,   3.25537709e+32])
>> In []: p_tst(randn(123))[-3:]
>> Out[]: array([  3.22753481e-04,   1.87056454e+00,   1.87056454e+00])
>> In []: p_tst(randn(123))[-3:]
>> Out[]: array([  2.98556327e+08,   2.98556327e+08,   8.23588003e+12])
>>
>> So, does this phenomena imply that
>> - I'm testing with too high order polynomials (if so, does there exists a
>> definite upper limit of polynomial order I'll not face this issue)
>> or
>> - it's just the 'nature' of computations with float values (if so,
>> probably I should be able to tackle this regardless of the polynomial order)
>> or
>> - it's a nasty bug in class Polynomial
>>
>>
> It's a defect. You will get all the roots and the number will equal the
> degree. I haven't decided what the best way to deal with this is, but my
> thoughts have trended towards specifying an interval with the default being
> the domain. If you have other thoughts I'd be glad for the feedback.
>
> For the problem at hand, note first that you are specifying the
> coefficients, not the roots as was the case with poly1d. Second, as a rule
> of thumb, plain old polynomials will generally only be good for degree < 22
> due to being numerically ill conditioned. If you are really looking to use
> high degrees, Chebyshev or Legendre will work better, although you will
> probably need to explicitly specify the domain. If you want to specify the
> polynomial using roots, do Poly.fromroots(...). Third, for the high degrees
> you are probably screwed anyway for degree 123, since the accuracy of the
> root finding will be limited, especially for roots that can cluster, and
> any root that falls even a little bit outside the interval [-1,1] (the
> default domain) is going to evaluate to a big number simply because the
> polynomial is going to h*ll at a rate you wouldn't believe ;)
>
> For evenly spaced roots in [-1, 1] and using Chebyshev polynomials, things
> look good for degree 50, get a bit loose at degree 75 but can be fixed up
> with one iteration of Newton, and blow up at degree 100. I think that's
> pretty good, actually, doing better would require a lot more work. There
> are some zero finding algorithms out there that might do better if someone
> wants to give it a shot.
>
> In [20]: p = Cheb.fromroots(linspace(-1, 1, 50))
>
> In [21]: sort(abs(p(p.roots(
> Out[21]:
> array([  6.20385459e-25,   1.65436123e-24,   2.06795153e-24,
>  5.79026429e-24,   5.89366186e-24,   6.44916482e-24,
>  6.44916482e-24,   6.77254127e-24,   6.97933642e-24,
>  7.25459208e-24,   1.00295649e-23,   1.37391414e-23,
>  1.37391414e-23,   1.63368171e-23,   2.39882378e-23,
>  3.30872245e-23,   4.38405725e-23,   4.49502653e-23,
>  4.49502653e-23,   5.58346913e-23,   8.35452419e-23,
>  9.38407760e-23,   9.38407760e-23,   1.03703218e-22,
>  1.03703218e-22,   1.23249911e-22,   1.75197880e-22,
>  1.75197880e-22,   3.07711188e-22,   3.09821786e-22,
>  3.09821786e-22,   4.56625520e-22,   4.56625520e-22,
>  4.69638303e-22,   4.69638303e-22,   5.96448724e-22,
>  5.96448724e-22,   1.24076485e-21,   1.24076485e-21,
>  1.59972624e-21,   1.59972624e-21,   1.62930347e-21,
>  1.62930347e-21,   1.73773328e-21,   1.73773328e-21,
>  1.87935435e-21,   2.30287083e-21,   2.48815928e-21,
>  2.85411753e-21,   2.85411753e-21])
>
Thanks,

for a very informative feedback. I'll study those orthogonal polynomials
more detail.


Regards,
- eat

>
>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] avoiding loops when downsampling arrays

2012-02-06 Thread eat

Hi,

On Mon, Feb 6, 2012 at 9:16 PM, Moroney, Catherine M (388D) <
catherine.m.moro...@jpl.nasa.gov> wrote:

> Hello,
>
> I have to write a code to downsample an array in a specific way, and I am
> hoping that
> somebody can tell me how to do this without the nested do-loops.  Here is
> the problem
> statement:  Segment a (MXN) array into 4x4 squares and set a flag if any
> of the pixels
> in that 4x4 square meet a certain condition.
>
> Here is the code that I want to rewrite avoiding loops:
>
> shape_out = (data_in.shape[0]/4, data_in.shape[1]/4)
> found = numpy.zeros(shape_out).astype(numpy.bool)
>
> for i in xrange(0, shape_out[0]):
>for j in xrange(0, shape_out[1]):
>
>excerpt = data_in[i*4:(i+1)*4, j*4:(j+1)*4]
>mask = numpy.where( (excerpt >= t1) & (excerpt <= t2),
> True, False)
>if (numpy.any(mask)):
>found[i,j] = True
>
> Thank you for any hints and education!
>
> Catherine
>
Following Warrens answer a slight demonstration of code like this:

import numpy as np

def ds_0(data_in, t1= 1, t2= 4):

shape_out= (data_in.shape[0]/ 4, data_in.shape[1]/ 4)

found= np.zeros(shape_out).astype(np.bool)

for i in xrange(0, shape_out[0]):

for j in xrange(0, shape_out[1]):

excerpt= data_in[i* 4: (i+ 1)* 4, j* 4: (j+ 1)* 4]

mask= np.where((excerpt>= t1)& (excerpt<= t2), True, False)

if (np.any(mask)):

found[i, j]= True

return found

# with stride_tricks you may cook up something like this:

from numpy.lib.stride_tricks import as_strided as ast

def _ss(dt, ds, s):

return {'shape': (ds[0]/ s[0], ds[1]/ s[1])+ s,

'strides': (s[0]* dt[0], s[1]* dt[1])+ dt}

def _view(D, shape= (4, 4)):

return ast(D, **_ss(D.strides, D.shape, shape))

def ds_1(data_in, t1= 1, t2= 4):

# return _view(data_in)

excerpt= _view(data_in)

mask= np.where((excerpt>= t1)& (excerpt<= t2), True, False)

return mask.sum(2).sum(2).astype(np.bool)

 if __name__ == '__main__':

from numpy.random import randint

r= randint(777, size= (64, 288)); print r

print np.allclose(ds_0(r), ds_1(r))




My 2 cents,
eat

> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] avoiding loops when downsampling arrays

2012-02-06 Thread eat

Hi,

Sorry for my latest post, hands way too quick ;(

On Mon, Feb 6, 2012 at 9:16 PM, Moroney, Catherine M (388D) <
catherine.m.moro...@jpl.nasa.gov> wrote:

> Hello,
>
> I have to write a code to downsample an array in a specific way, and I am
> hoping that
> somebody can tell me how to do this without the nested do-loops.  Here is
> the problem
> statement:  Segment a (MXN) array into 4x4 squares and set a flag if any
> of the pixels
> in that 4x4 square meet a certain condition.
>
> Here is the code that I want to rewrite avoiding loops:
>
> shape_out = (data_in.shape[0]/4, data_in.shape[1]/4)
> found = numpy.zeros(shape_out).astype(numpy.bool)
>
> for i in xrange(0, shape_out[0]):
>for j in xrange(0, shape_out[1]):
>
>excerpt = data_in[i*4:(i+1)*4, j*4:(j+1)*4]
>mask = numpy.where( (excerpt >= t1) & (excerpt <= t2),
> True, False)
>if (numpy.any(mask)):
>found[i,j] = True
>
> Thank you for any hints and education!
>
Following closely with Warrens answer a slight demonstration of code like
this:
import numpy as np

 def ds_0(data_in, t1= 1, t2= 4):

shape_out= (data_in.shape[0]/ 4, data_in.shape[1]/ 4)

found= np.zeros(shape_out).astype(np.bool)

for i in xrange(0, shape_out[0]):

for j in xrange(0, shape_out[1]):

excerpt= data_in[i* 4: (i+ 1)* 4, j* 4: (j+ 1)* 4]

mask= np.where((excerpt>= t1)& (excerpt<= t2), True, False)

if (np.any(mask)):

found[i, j]= True

return found


 # with stride_tricks you may cook up something like this:

from numpy.lib.stride_tricks import as_strided as ast


 def _ss(dt, ds, s):

return {'shape': (ds[0]/ s[0], ds[1]/ s[1])+ s,

'strides': (s[0]* dt[0], s[1]* dt[1])+ dt}


 def _view(D, shape= (4, 4)):

return ast(D, **_ss(D.strides, D.shape, shape))


 def ds_1(data_in, t1= 1, t2= 4):

excerpt= _view(data_in)

mask= np.where((excerpt>= t1)& (excerpt<= t2), True, False)

return mask.sum(2).sum(2).astype(np.bool)


 if __name__ == '__main__':

from numpy.random import randint

r= randint(777, size= (64, 288)); print r

print np.allclose(ds_0(r), ds_1(r))


and when run, it will yield like:

In []: run dsa

[[ 60 470 521 ..., 147 435 295]

 [246 127 662 ..., 718 525 256]

 [354 384 205 ..., 225 364 239]

 ...,

 [277 428 201 ..., 460 282 433]

 [ 27 407 130 ..., 245 346 309]

 [649 157 153 ..., 316 613 570]]

True

and compared in performance wise:
In []: %timeit ds_0(r)
10 loops, best of 3: 56.3 ms per loop

In []: %timeit ds_1(r)
100 loops, best of 3: 2.17 ms per loop


My 2 cents,

eat



> Catherine
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] avoiding loops when downsampling arrays

2012-02-07 Thread eat

Hi

This is elegant and very fast as well!
On Tue, Feb 7, 2012 at 2:57 PM, Sturla Molden  wrote:

> On 06.02.2012 22:27, Sturla Molden wrote:
> >
> >
> >>
> >> # Make a 4D view of this data, such that b[i,j]
> >> # is a 2D block with shape (4,4) (e.g. b[0,0] is
> >> # the same as a[:4, :4]).
> >> b = as_strided(a, shape=(a.shape[0]/4, a.shape[1]/4, 4, 4),
> >> strides=(4*a.strides[0], 4*a.strides[1], a.strides[0],
> a.strides[1]))
> >>
> >
> > Yes :-) Being used to Fortran (and also MATLAB) this is the kind of
> mapping It never occurs for me to think about. What else but NumPy is
> flexible enough to do this? :-)
>
> Actually, using as_strided is not needed. We can just reshape like this:
>
>(m,n) ---> (m//4, 4, n//4, 4)
>
> and then use np.any along the two length-4 dimensions.
>
>   m,n = data.shape
>   cond = lamda x : (x <= t1) & (x >= t2)
>
I guess you meant here cond= lambda x: (x>= t1)& (x<= t2)

>   x = cond(data).reshape((m//4, 4, n//4, 4))
>   found = np.any(np.any(x, axis=1), axis=2)
>
Regards,
eat

>
>
> Sturla
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy.arange() error?

2012-02-09 Thread eat

Hi,

On Thu, Feb 9, 2012 at 9:47 PM, Eric Firing  wrote:

> On 02/09/2012 09:20 AM, Drew Frank wrote:
> > Eric Firing  hawaii.edu>  writes:
> >
> >>
> >> On 02/08/2012 09:31 PM, teomat wrote:
> >>>
> >>> Hi,
> >>>
> >>> Am I wrong or the numpy.arange() function is not correct 100%?
> >>>
> >>> Try to do this:
> >>>
> >>> In [7]: len(np.arange(3.1, 4.9, 0.1))
> >>> Out[7]: 18
> >>>
> >>> In [8]: len(np.arange(8.1, 9.9, 0.1))
> >>> Out[8]: 19
> >>>
> >>> I would expect the same result for each command.
> >>
> >> Not after more experience with the wonders of floating point!
> >> Nice-looking decimal numbers often have long, drawn-out, inexact
> >> floating point (base 2) representations.  That leads to exactly this
> >> sort of problem.
> >>
> >> numpy.linspace is provided to help get around some of these surprises;
> >> or you can use an integer sequence and then scale and shift it.
> >>
> >> Eric
> >>
> >>>
> >>> All the best
> >>>
> >>>
> >>
> > I also found this surprising -- not because I lack experience with
> floating
> > point, but because I do have experience with MATLAB.  In MATLAB, the
> > corresponding operation 3.1:0.1:4.9 has length 19 because of an explicit
> > tolerance parameter used in the implmentation
> > (
> http://www.mathworks.com/support/solutions/en/data/1-4FLI96/index.html?solution=1-4FLI96
> ).
> >
> > Of course, NumPy is not MATLAB :).  That said, I prefer the MATLAB
> behavior in
> > this case -- even though it has a bit of a "magic" feel to it, I find it
> hard to
> > imagine code that operates correctly given the Python semantics and
> incorrectly
> > under MATLAB's.  Thoughts?
>
> You raise a good point.  Neither arange nor linspace provides a close
> equivalent to the nice behavior of the Matlab colon, even though that is
> often what one really wants.  Adding this, either via an arange kwarg, a
> linspace kwarg, or a new function, seems like a good idea.
>
Maybe this issue is raised also earlier, but wouldn't it be more consistent
to let arange operate only with integers (like Python's range) and let
linspace handle the floats as well?


My 2 cents,
eat

>
> Eric
>
> >
> >
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Want to eliminate direct for-loop

2012-02-11 Thread eat

Hi,

On Sat, Feb 11, 2012 at 10:56 PM, Dinesh B Vadhia  wrote:

> **
> Could the following be written without the direct for-loop?
>
> import numpy
> # numpy vector r of any data type and length, eg.
> r = numpy.ones(25, dtype='int')
> # s is a list of values (of any data type), eg.
> s = [47, 27, 67]
> # c is a list of (variable length) lists where the sub-list elements are
> index values of r and len(s) = len(c), eg.
> c = [[3, 6, 9], [6, 11, 19, 24], [4, 9, 11, 21 ]]
> # for each element in each sub-list c, add corresponding s value to the
> index value in r, eg.
> for i, j in enumerate(c):
> r[j] += s[i]
>
> So, we get:
> r[[3, 6, 9]] += s[0] = 1 + 47 = 48
> r[[6, 11, 19, 24]] += s[1] = 1 + 27 = 28
> r[[4, 9, 11, 21]] += s[2] = 1 + 67 = 68
>
> ie. r = array([  1,   1,   1,  95,  68,   1, 122,   1,   1, 162,   1,
> 95,   1,  1,   1,   1,   1,   1,   1,  28,   1,  68,   1,   1,  28])
>
> Thank-you!
>
Could you describe more detailed manner about why you want to get rid of
that loop? Performance wise? If so, do you have profiled what's
the bottleneck?

Please provide also a more detailed description of your problem, since now
your current spec seems to yield:
r= array([  1,   1,   1,  48,  68,   1,  75,   1,   1, 115,   1,  95,   1,
1,   1,   1,   1,   1,   1,  28,   1,  68,   1,   1,  28])


My 2 cents,
-eat

>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Initializing an array to a constant value

2012-02-13 Thread eat

Hi,

A slightly OT (and not directly answering to your question), but

On Mon, Feb 13, 2012 at 3:30 PM, Pierre Haessig wrote:

> I have a pretty silly question about initializing an array a to a given
> scalar value, say A.
>
> Most of the time I use a=np.ones(shape)*A which seems the most
> widespread idiom, but I got recently interested in getting some
> performance improvement.
>
> I tried a=np.zeros(shape)+A, based on broadcasting but it seems to be
> equivalent in terms of speed.
>
> Now, the fastest :
> a = np.empty(shape)
> a.fill(A)
>
wouldn't it be nice if you could just write:
a= np.empty(shape).fill(A)
this would be possible if .fill(.) just returned self.

I assume that this topic has been discussed previously, but personally I
feel that the code would be more readable when returning self rather None.
Thus all ndarray methods should return something meaningful to act on (in
the spirit that methods are more like functions than subroutines).


Just my 2 cents,
-eat

>
> but it is a two-steps instruction to do one thing, which I feel doesn't
> look very nice.
>
> Did I miss an all-in-one function like numpy.fill(shape, A) ?
>
> Best,
> Pierre
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Numpy beginner tutorial

2013-05-07 Thread eat

Hi,

Looks nice tutorial, indeed.


On Tue, May 7, 2013 at 12:54 PM, Nicolas Rougier
wrote:

>
>
> Hello everybody,
>
>
> I've written a numpy beginner tutorial that is available from:
>
> http://www.loria.fr/~rougier/teaching/numpy/numpy.html
>
> It has been designed around cellular automata to try to make it fun.
>
Perhaps you could also link to
http://www.scipy.org/Cookbook/GameOfLifeStrides (at least if you are
planning to have exercises beyond Apprentice level). IMHO it just provides
more natural view of the neighborhood via stride_tricks.

>
>
> While writing it, I tried to compile a set of exercises and make them
> progressively harder. For advanced levels, I thought the easiest way would
> be to extract simple questions (but more importantly answers) from this
> very mailing list in order to gather them on a single page. The goal would
> be both to offer a quick reference for new (and old users) and to provide
> also a set of exercices for those who teach. However, it's a bit harder
> than I thought since the mailing list is huge.
>
> I made a separate page for this:
>
> http://www.loria.fr/~rougier/teaching/numpy.100/index.html
> (Sources are http://www.loria.fr/~rougier/teaching/numpy.100/index.rst)
>
> (The level names came from an old-game: Dungeon Master)
>
>
> In order to extract questions/answers and I would need some help, if you
> have some free time to spare...
>
> If you remember having asked or answered a (short) problem, could you send
> a link to the relevant post (the one with the answer), or better, write
> directly the formated entry. Here is an example:
>
>
> #. Find indices of non-zero elements from [1,2,0,0,4,0]
>
>.. code:: python
>
>   # Author: Somebody
>
>   print np.nonzero([1,2,0,0,4,0])
>
>
> If you can provide the (assumed) level of the answer, that would be even
> better.

My 2 cents,
-eat

>


> Nicolas
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Possible conversion bug with record array

2013-05-22 Thread eat

Hi,

FWIW, apparently bug related to dtype of np.eye(.)


On Wed, May 22, 2013 at 8:07 PM, Nicolas Rougier
wrote:

>
>
> Hi all,
>
> I got a weird output from the following script:
>
> import numpy as np
>
> U = np.zeros(1, dtype=[('x', np.float32, (4,4))])
>
> U[0] = np.eye(4)
> print U[0]
> # output:  ([[0.0, 1.875, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0,
> 1.875], [0.0, 0.0, 0.0, 0.0]],)
>
> U[0] = np.eye(4, dtype=np.float32)
> print U[0]
> # output:  ([[1.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0], [0.0, 0.0, 1.0,
> 0.0], [0.0, 0.0, 0.0, 1.0]],)
>
>
> The first output is obviously wrong. Can anyone confirm ?
> (using numpy 1.7.1 on osx 10.8.3)
>
In []: sys.version
Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]'
In []: np.__version__
Out[]: '1.6.0'
In []: U= np.zeros(1, dtype= [('x', np.float32, (4, 4))])

In []: U[0]= np.eye(4)
In []: U
Out[]:
array([ ([[0.0, 1.875, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0,
1.875], [0.0, 0.0, 0.0, 0.0]],)],
  dtype=[('x', '
>
> Nicolas
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Creating an ndarray from an iterable over sequences

2014-01-21 Thread eat

Hi,


On Tue, Jan 21, 2014 at 8:34 AM, Dr. Leo  wrote:

> Hi,
>
> I would like to write something like:
>
> In [25]: iterable=((i, i**2) for i in range(10))
>
> In [26]: a=np.fromiter(iterable, int32)
> ---
> ValueErrorTraceback (most recent call
> last)
>  in ()
> > 1 a=np.fromiter(iterable, int32)
>
> ValueError: setting an array element with a sequence.
>
>
> Is there an efficient way to do this?
>
Perhaps you could just utilize structured arrays (
http://docs.scipy.org/doc/numpy/user/basics.rec.html), like:
iterable= ((i, i**2) for i in range(10))
a= np.fromiter(iterable, [('a', int32), ('b', int32)], 10)
a.view(int32).reshape(-1, 2)
Out[]:
array([[ 0,  0],
   [ 1,  1],
   [ 2,  4],
   [ 3,  9],
   [ 4, 16],
   [ 5, 25],
   [ 6, 36],
   [ 7, 49],
   [ 8, 64],
   [ 9, 81]])

My 2 cents,
-eat

>
> Creating two 1-dimensional arrays first is costly as one has to
> iterate twice over the data. So the only way I see is creating an
> empty [10,2] array and filling it row by row. This is memory-efficient
> but slow. List comprehension is vice versa.
>
> If there is no solution, wouldn't it be possible to rewrite fromiter
> so as to accept sequences?
>
> Leo
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] deprecate numpy.matrix

2014-02-10 Thread eat

On Mon, Feb 10, 2014 at 7:00 PM, alex  wrote:

> On Mon, Feb 10, 2014 at 11:27 AM,   wrote:
> > How do we calculate the diagonal of the hat matrix without using N by N
> > matrices?
>
> Not sure if this was a rhetorical question or what, but this seems to work
> leverages = np.square(scipy.linalg.qr(X, mode='economic')[0]).sum(axis=1)
> http://www4.ncsu.edu/~ipsen/ps/slides_CSE2013.pdf

Rhetorical or not, but FWIW I'll prefer to take singular value
decomposition (u, s, vt= svd(x)) and then based on the singular values
sI'll estimate a "numerically feasible rank"
r. Thus the diagonal of such hat matrix would be (u[:, :r]** 2).sum(1).

Regards,
-eat

>
> Sorry for off-topic...
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] deprecate numpy.matrix

2014-02-10 Thread eat

On Mon, Feb 10, 2014 at 9:08 PM, alex  wrote:

> On Mon, Feb 10, 2014 at 2:03 PM, eat  wrote:
> > Rhetorical or not, but FWIW I'll prefer to take singular value
> decomposition
> > (u, s, vt= svd(x)) and then based on the singular values s I'll estimate
> a
> > "numerically feasible rank" r. Thus the diagonal of such hat matrix
> would be
> > (u[:, :r]** 2).sum(1).
>
> It's a small detail but you probably want svd(x, full_matrices=False)
> to avoid anything NxN.
>
Indeed.

Thanks,
-eat

> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Characteristic of a Matrix.

2015-01-05 Thread eat

Hi,


On Mon, Jan 5, 2015 at 8:40 PM, Colin J. Williams 
wrote:

> One of the essential characteristics of a matrix is that it be rectangular.
>
> This is neither spelt out or checked currently.
>
> The Doc description refers to a class:
>
>- *class *numpy.matrix[source]
>
> <http://github.com/numpy/numpy/blob/v1.9.1/numpy/matrixlib/defmatrix.py#L206>
>
> Returns a matrix from an array-like object, or from a string of data. A
> matrix is aspecialized 2-D array that retains its 2-D
> nature through operations. It has certain special operators, such as *
> (matrix multiplication) and ** (matrix power).
>
> This illustrates a failure, which is reported later in the calculation:
>
> A2= np.matrix([[1, 2, -2], [-3, -1, 4], [4, 2 -6]])
>
> Here 2 - 6 is treated as an expression.
>
FWIW, here A2 is definitely rectangular, with shape== (1, 3) and dtype==
object, i.e elements are just python lists.

> Wikipedia offers:
>
> In mathematics <http://en.wikipedia.org/wiki/Mathematics>, a *matrix*
> (plural *matrices*) is a rectangular
> <http://en.wikipedia.org/wiki/Rectangle> *array
> <http://en.wiktionary.org/wiki/array>*[1]
> <http://en.wikipedia.org/wiki/Matrix_%28mathematics%29#cite_note-1> of
> numbers <http://en.wikipedia.org/wiki/Number>, symbols
> <http://en.wikipedia.org/wiki/Symbol_%28formal%29>, or expressions
> <http://en.wikipedia.org/wiki/Expression_%28mathematics%29>, arranged in *rows
> <http://en.wiktionary.org/wiki/row>* and *columns
> <http://en.wiktionary.org/wiki/column>*.[2]
> <http://en.wikipedia.org/wiki/Matrix_%28mathematics%29#cite_note-2>[3]
> <http://en.wikipedia.org/wiki/Matrix_%28mathematics%29#cite_note-3>
>
(and in this context also python objects).

-eat

> The individual items in a matrix are called its *elements* or *entries*.
> An example of a matrix with 2 rows and 3 columns is
> [image: \begin{bmatrix}1 & 9 & -13 \\20 & 5 & -6 \end{bmatrix}.]In the
> Numpy context, the symbols or expressions need to be evaluable.
>
> Colin W.
>
>
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Characteristic of a Matrix.

2015-01-05 Thread eat

On Mon, Jan 5, 2015 at 9:36 PM, Nathaniel Smith  wrote:

> On Mon, Jan 5, 2015 at 7:18 PM,  wrote:
> >
> >
> >
> > On Mon, Jan 5, 2015 at 1:58 PM, Nathaniel Smith  wrote:
> >>
> >> I'm afraid that I really don't understand what you're trying to say. Is
> there something that you think numpy should be doing differently?
> >
> >
> > I liked it better when this raised an exception, instead of creating a
> rectangular object array.
>
> Did it really used to raise an exception? Patches accepted :-)  (#5303
> is the relevant bug, like Warren points out. From the discussion there
> it doesn't look like np.array's handling of non-conformable lists has
> any defenders.)
>
+1 for 'object array [and matrix] construction should require explicitly
specifying dtype= object'

-eat

>
> --
> Nathaniel J. Smith
> Postdoctoral researcher - Informatics - University of Edinburgh
> http://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] ufunc for sum of squared difference

2016-11-14 Thread eat

Yeah,

but it's not so obvious what's happening "under the hoods". Consider this
(with an old Win7 machine):
Python 3.5.2 (v3.5.2:4def2a2901a5, Jun 25 2016, 22:18:55) [MSC v.1900 64
bit (AMD64)]
np.__version__
'1.11.1'

On Mon, Nov 14, 2016 at 10:38 AM, Jerome Kieffer 
wrote:

> On Fri, 11 Nov 2016 11:25:58 -0500
> Matthew Harrigan  wrote:
>
> > I started a ufunc to compute the sum of square differences here
> > <https://gist.github.com/mattharrigan/6f678b3d6df5efd236fc23bfb59fd3bd>.
> > It is about 4x faster and uses half the memory compared to
> > np.sum(np.square(x-c)).
>
> Hi Matt,
>
> Using *blas* you win already a factor two (maybe more depending on you
> blas implementation):
>
> % python -m timeit -s "import numpy as np;x=np.linspace(0,1,int(1e7))"
> "np.sum(np.square(x-2.))"
> 10 loops, best of 3: 135 msec per loop
>
> % python -m timeit -s "import numpy as np;x=np.linspace(0,1,int(1e7))"
> "y=x-2.;np.dot(y,y)"
> 10 loops, best of 3: 70.2 msec per loop
>
x= np.linspace(0, 1, int(1e6))

timeit np.sum(np.square(x- 2.))
10 loops, best of 3: 23 ms per loop

y= x- 2.

timeit np.dot(y, y)
The slowest run took 18.60 times longer than the fastest. This could mean
that an intermediate result is being cached.
1000 loops, best of 3: 1.78 ms per loop

timeit np.dot(y, y)
1000 loops, best of 3: 1.73 ms per loop

Best,
eat

>
>
> Cheers,
> --
> Jérôme Kieffer
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] "Match" two arrays

2010-04-01 Thread eat

Shailendra  gmail.com> writes:

> 
> Hi All,
> I want to make a function which should be like this
> 
> cordinates1=(x1,y1) # x1 and y1 are x-cord and y-cord of a large
> number of points
> cordinates2=(x2,y2) # similar to condinates1
> indices1,indices2= match_cordinates(cordinates1,cordinates2)
> 
> (x1[indices1],y1[indices1]) "matches" (x2[indices2],y2[indices2])
> 
> where definition of "match" is such that :
> If A is closest point to B and distance between A and B is less that
> delta than it is a "match".
> If A is closest point to B and distance between A and B is more that
> delta than there is no match.
> Every point has either 1 "match"(closest point) or none

This logic is problematic in general case. See below. You may need to be able 
to handle several pairs of 'closest points'!
> 
> Also, the size of the cordinates1 and cordinates2 are quite large and
> "outer" should not be used. I can think of only C style code to
> achieve this. Can any one suggest pythonic way of doing this?
> 
> Thanks,
> Shailendra
> 
This is straightforward implementation as a starting point.


eat


import numpy as np

def dist(p1, p2):
return np.sqrt(np.sum((p1- p2)** 2, 0))

def cdist(p1, p2, trh):
"""Expects 2d arrays p1 and p2, with combatible first dimesions
and a threshold.

Returns indicies of points close to each other
-ind[:, 0], array of p1 indicies
-ind[:, 1], array of p2 indicies
-ambi, list of list of ambiquous situations (where more
than 1 pair of points are 'equally close')

The indicies are aranged such that
dist(p1[:, ind[k, 0]], p2[:, ind[k, 1]])< trh
is true for all k.
"""
ind= []
ambi= []
for k in range(p2.shape[1]):
d= dist(p1, p2[:, None, k])
i= np.where(d< trh)[0]
if 0< len(i):
m= np.where(d[i]== d[i].min())[0] # problematic
i= i[m].tolist()
ind.append([i[0], k])
if 1< len(m):
ambi.append([ind[-1], i])
return np.array(ind), ambi


if __name__ == '__main__':
n= 10
trh= 2e-1
p1= np.round(np.random.rand(2, n), 1)
p2= np.round(p1+ 1e-1* np.random.randn(2, n), 1)
ind, ambi= cdist(p1, p2, trh)
print 'points close to each other:'
if 0< len(ind):
print 'p1:'
print p1[:, ind[:, 0]], ind[:, 0]
print 'p2:'
print p2[:, ind[:, 1]], ind[:, 1]
print 'valid:'
print dist(p1[:, ind[:, 0]], p2[:, ind[:, 1]])< trh
print 'with ambiguous situation(s):'
if ambi:
print ambi
else:
print 'None'
else:
print 'None'


import timeit
n= 1e2
trh= 2e-1
rep= 5
p1= np.random.rand(2, 1e3* n)
p2= np.random.randn(2, n)
def perf():
ind, ambi= cdist(p1, p2, trh)

print 'performance:'
t= np.array(timeit.repeat(perf, repeat= rep, number= 1))/ rep
print 'min: ', t.min(), 'mean: ', t.mean(), 'max: ', t.max()




___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] "Match" two arrays

2010-04-01 Thread eat

Oops.

Wrongly timed.
> t= np.array(timeit.repeat(perf, repeat= rep, number= 1))/ rep

should be
  t= np.array(timeit.repeat(perf, repeat= rep, number= 1))


eat



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Getting index of array after applying cond

2010-04-02 Thread eat

Shailendra  gmail.com> writes:

> 
> I forgot to mention that i wanted this to work for general shape. So i
> modified it little bit
> 
> >>> x = array([[1,2,3,4,5], [6,7,8,7,6], [1,2,3,4,5]])
> >>> cond = (x > 5)
> >>> loc= where(cond)
> >>> arg_max=argmax(x[cond])
> >>> x[tuple([e[arg_max] for e in loc])]
> 8

But what happens if your x is for example
x = array([[1,2,3,4,5], [6,8,8,8,6], [1,2,3,4,5]])

x[tuple([e[arg_max] for e in loc])]
would still yield to 8,
which may or may not be an acceptable answer.

Basically what I mean is that why to bother with the argmax at all,
if your only interest is x[cond].max()?

Just my 2 cents.

Regards,
eat

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Getting index of array after applying cond

2010-04-02 Thread eat

Shailendra  gmail.com> writes:

> 
> Well, this is just a toy problem. argmax represent a method which will
> give me a index in x[cond] . And for the case of multiple value my
> requirement is fine with getting  any "max" index.

OK. My concern seems to be needless then.

BTW, does this current thread relate anyway to the earlier one '"Match" two 
arrays'? If so, would you like to elaborate more about your 'real' problem?

Regards,
eat
> 
> Thanks,
> Shailendra

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Find indices of largest elements

2010-04-14 Thread eat

Nikolaus Rath  rath.org> writes:

> 
> Hello,
> 
> How do I best find out the indices of the largest x elements in an
> array?
> 
> Example:
> 
> a = [ [1,8,2], [2,1,3] ]
> magic_function(a, 2) == [ (0,1), (1,2) ]
> 
> Since the largest 2 elements are at positions (0,1) and (1,2).
> 
> Best,
> 
>-Niko
> 
Hi,

Just
a= np.asarray([[1, 8, 2], [2, 1, 3]])
print np.where((a.T== a.max(axis= 1)).T)

However, if any row contains more than 1 max entity, above will fail. Please 
let me to know if that's relevant for you.

-eat


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Find indices of largest elements

2010-04-15 Thread eat

Nikolaus Rath  rath.org> writes:

[snip]
> Not quite, because I'm interested in the n largest values over all
> elements, not the largest element in each row or column. But Keith's
> solution seems to work fine, even though I'm still struggling to
> understand what's going on there .

My bad. I just concentrated on your example, not the actual question.

However, what's wrong with your above approach
"[ np.unravel_index(i, x.shape) for i in idx[-3:] ]" ?

Especially if your n largest elements are just a small fraction of all 
elements.

# Note also the differencies
a= np.asarray([[1, 8, 2], [2, 3, 3], [3, 4, 1]])
n= 3
# between
print [np.unravel_index(ind, a.shape) for ind in np.argsort(a.ravel())[-n:]]
# and
print [np.where(val== a) for val in np.sort(a.ravel())[-n:]]

Regards,
eat
> 
> Best,
> 
>-Nikolaus
> 

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Crosstabulation

2010-07-19 Thread eat

Ionut Sandric  yahoo.com> writes:

> Thank you Zack:
> 
> By raster data I mean classified slope gradient (derived from a dem), 
landuse-landcover, lithology etc. A
> crosstabulation analysis will give me a table with the common areas for each 
class from each raster and
> this will go into other analysis. I can do it with other softwares (like 
ArcGIS DEsktop etc), but I would
> like to have all with numpy or to build something on top of numpy
> 
> Thanks's again
> 
> Ionut
> 
> - Original Message -
> From: "Zachary Pincus"  yale.edu>
> To: "Discussion of Numerical Python"  scipy.org>
> Sent: Wednesday, July 14, 2010 9:42:49 PM GMT +02:00 Athens, Beirut, 
Bucharest, Istanbul
> Subject: Re: [Numpy-discussion] Crosstabulation
> 
> Hi Ionut,
> 
> Check out the "tabular" package:
> http://parsemydata.com/tabular/index.html
> 
> It seems to be basically what you want... it does "pivot tables" (aka  
> crosstabulation), it's built on top of numpy, and has simple data IO  
> tools.
> 
> Also check out this discussion on "pivot tables" from the numpy list a  
> while ago:
> http://mail.scipy.org/pipermail/numpy-discussion/2007-August/028739.html
> 
> One question -- what do you mean by "raster data"? In my arena, that  
> usually means "images"... and I'm not sure what a crosstabulation on  
> image data would mean!
> 
> Zach
> 
> On Jul 14, 2010, at 2:28 PM, Ionut Sandric wrote:
> 
> >
> > Sorry, the first email was sent before to finish it...
> >
> >
> > Hi:
> >
> > I have two raster data and I would like to do a crosstabulation  
> > between them and export the results to a table in a text file. Is it  
> > possible to do it with NumPy? Does someone have an example?
> >
> > Thank you,
> >
> > Ionut
> >
> >
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion  scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion  scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
Hi,

You may be able to adapt this simple script to your case.

import numpy as np

# generate some test data
def gen(b, e, m, n):
return np.arange(b, e+ 1), np.random.randint(b, e+ 1, (m, n))
m, n= 15, 15
c1, d1= gen(0, 3, m, n); print d1
c2, d2= gen(3, 5, m, n); print d2

# perform actual x-tabulation
xtab= np.zeros((len(c1), len(c2)), np.int)
for i in xrange(len(c1)):
tmp= d2[c1[i]== d1]
for j in xrange(len(c2)):
xtab[i, j]= np.sum(c2[j]== tmp)
print xtab, np.sum(xtab)== np.prod(d1.shape)


Anyway it's straightforward to extend it to nd x-tabulations ;-).


My 2 cents,
eat



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] How to improve performance of slow tri*_indices calculations?

2011-01-24 Thread eat

Hi,

Running on:
In []: np.__version__
Out[]: '1.5.1'
In []: sys.version
Out[]: '2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)]'

For the reference:
In []: X= randn(10, 125)
In []: timeit dot(X.T, X)
1 loops, best of 3: 170 us per loop
In []: X= randn(10, 250)
In []: timeit dot(X.T, X)
1000 loops, best of 3: 671 us per loop
In []: X= randn(10, 500)
In []: timeit dot(X.T, X)
100 loops, best of 3: 5.15 ms per loop
In []: X= randn(10, 1000)
In []: timeit dot(X.T, X)
100 loops, best of 3: 20 ms per loop
In []: X= randn(10, 2000)
In []: timeit dot(X.T, X)
10 loops, best of 3: 80.7 ms per loop

Performance of triu_indices:
In []: timeit triu_indices(125)
1000 loops, best of 3: 662 us per loop
In []: timeit triu_indices(250)
100 loops, best of 3: 2.55 ms per loop
In []: timeit triu_indices(500)
100 loops, best of 3: 15 ms per loop
In []: timeit triu_indices(1000)
10 loops, best of 3: 59.8 ms per loop
In []: timeit triu_indices(2000)
1 loops, best of 3: 239 ms per loop

So the tri*_indices calculations seems to be unreasonable slow compared to for 
example calculations of inner products.

Now, just to compare for a very naive implementation of triu indices.
In []: def iut(n):
   ..: r= np.empty(n* (n+ 1)/ 2, dtype= int)
   ..: c= r.copy()
   ..: a= np.arange(n)
   ..: m= 0
   ..: for i in xrange(n):
   ..: ni= n- i
   ..: mni= m+ ni
   ..: r[m: mni]= i
   ..: c[m: mni]= a[i: n]
   ..: m+= ni
   ..: return (r, c)
   ..:

Are we really calculating the same thing?
In []: triu_indices(5)
Out[]:
(array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]),
 array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4]))
In []: iut(5)
Out[]:
(array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]),
 array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4]))

Seems so, and then its performance:
In []: timeit iut(125)
1000 loops, best of 3: 992 us per loop
In []: timeit iut(250)
100 loops, best of 3: 2.03 ms per loop
In []: timeit iut(500)
100 loops, best of 3: 5.3 ms per loop
In []: timeit iut(1000)
100 loops, best of 3: 13.9 ms per loop
In []: timeit iut(2000)
10 loops, best of 3: 39.8 ms per loop

Even the naive implementation is very slow, but allready outperforms 
triu_indices, when n is > 250!

So finally my question is how one could substantially improve the performance 
of indices calculations?


Regards,
eat


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] tril, triu, document/ implementation conflict

2011-01-26 Thread eat

Hi,

I just noticed a document/ implementation conflict with tril and triu.
According tril documentation it should return of same shape and data-type as

called. But this is not the case at least with dtype bool.

The input shape is referred as (M, N) in tril and triu, but as (N, M) in
tri.
Inconsistent?

Also I'm not very happy with the performance, at least dtype bool can be
accelerated as follows.

In []: M= ones((2000, 3000), dtype= bool)
In []: timeit triu(M)
10 loops, best of 3: 173 ms per loop
In []: timeit triu_(M)
10 loops, best of 3: 107 ms per loop

In []: M= asarray(M, dtype= int)
In []: timeit triu(M)
10 loops, best of 3: 160 ms per loop
In []: timeit triu_(M)
10 loops, best of 3: 163 ms per loop

In []: M= asarray(M, dtype= float)
In []: timeit triu(M)
10 loops, best of 3: 195 ms per loop
In []: timeit triu_(M)
10 loops, best of 3: 157 ms per loop

I have attached a crude 'fix' incase someone is interested.
Regards,
eat


twodim_base_fix.py
Description: Binary data
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] tril, triu, document/ implementation conflict

2011-01-26 Thread eat

Hi,

On Wed, Jan 26, 2011 at 2:35 PM,  wrote:

>  On Wed, Jan 26, 2011 at 7:22 AM, eat  wrote:
> > Hi,
> >
> > I just noticed a document/ implementation conflict with tril and triu.
> > According tril documentation it should return of same shape and data-type
> as
> > called. But this is not the case at least with dtype bool.
> >
> > The input shape is referred as (M, N) in tril and triu, but as (N, M) in
> > tri.
> > Inconsistent?
>
Any comments about the names for rows and cols. I prefer (M, N).

>  >
> > Also I'm not very happy with the performance, at least dtype bool can be
> > accelerated as follows.
> >
> > In []: M= ones((2000, 3000), dtype= bool)
> > In []: timeit triu(M)
> > 10 loops, best of 3: 173 ms per loop
> > In []: timeit triu_(M)
> > 10 loops, best of 3: 107 ms per loop
> >
> > In []: M= asarray(M, dtype= int)
> > In []: timeit triu(M)
> > 10 loops, best of 3: 160 ms per loop
> > In []: timeit triu_(M)
> > 10 loops, best of 3: 163 ms per loop
> >
> > In []: M= asarray(M, dtype= float)
> > In []: timeit triu(M)
> > 10 loops, best of 3: 195 ms per loop
> > In []: timeit triu_(M)
> > 10 loops, best of 3: 157 ms per loop
> >
> > I have attached a crude 'fix' incase someone is interested.
>
> You could open a ticket for this.
>
> just one comment:
> I don't think this is readable, especially if we only look at the
> source of the function with np.source
>
> out= mul(ge(so(ar(m.shape[0]), ar(m.shape[1])), -k), m)
>
> from np.source(np.tri) with numpy 1.5.1
> m = greater_equal(subtract.outer(arange(N), arange(M)),-k)

I agree, thats why I called it crude. Before opening a ticket I'll try to
figure out if there exists somewhere in numpy .astype functionality, but not
copying if allready proper dtype.

Also I'm afraid that I can't produce sufficient testing.

Regards, eat

>
> Josef
>
> >
> > Regards,
> > eat
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Help in speeding up accumulation in a matrix

2011-01-29 Thread eat

Hi,

On Sat, Jan 29, 2011 at 11:01 PM, Nicolas SCHEFFER <
scheffer.nico...@gmail.com> wrote:

> Hi all,
>
> First email to the list for me, I just want to say how grateful I am
> to have python+numpy+ipython etc... for my day to day needs. Great
> combination of software.
>
> Anyway, I've been having this bottleneck in one my algorithms that has
> been bugging me for quite a while.
> The objective is to speed this part up. I've been doing tons of
> optimization and parallel processing around that piece of code to get
> a decent run time.
>
> The problem is easy. You want to accumulate in a matrix, a weighted
> sum of other matrices. Let's call this function scale_and_add:
> def scale_and_add_re(R,w,Ms):
>(nb_add,mdim,j)=np.shape(Ms)
>for i in range(nb_add):
>R+=w[i]*Ms[i]
>return R
> This 'for' loop bugs me since I know this will slow things down.
>
> But the dimension of these things are so large that any attempt to
> vectorize this is slower and takes too much memory.
> I typically work with 1000 weights and matrices, matrices of dimension
> of several hundred.
>
> My current config is:
> In [2]: %timeit scale_and_add_re(R,w,Ms)
> 1 loops, best of 3: 392 ms per loop
>
> In [3]: w.shape
> Out[3]: (2000,)
>
> In [4]: Ms.shape
> Out[4]: (2000, 200, 200)
> I'd like to be able to double these dimensions.

How this array Ms is created?  Do you really need to have it in the memory
as whole?

Assuming it's created by (200, 200) 'chunks' at a time, then you could
accumulate that right away to R. It still involves Python looping but that's
not so much overhead.

My 2 cents
eat
For instance I could use broadcasting by using a dot product
%timeit dot(Ms.T,w)
1 loops, best of 3: 1.77 s per loop
But this is i) slower ii) takes too much memory
(btw, I'd really need an inplace dot-product in numpy to avoid the
copy in memory, like the blas call in scipy.linalg. But that's for an
other thread...)

The matrices are squared and symmetric. I should be able to get
something out of this, but I never found anything related to this in
Numpy.

I also tried a Cython reimplementation
%timeit scale_and_add_reg(L1,w,Ms)
1 loops, best of 3: 393 ms per loop
It brought nothing in speed.

Here's the code
@cython.boundscheck(False)
def scale_and_add_reg(np.ndarray[float, ndim=2] R, np.ndarray[float,
ndim=1] w, np.ndarray[float, ndim=3] Ms):
   return _scale_and_add_reg(R,w,Ms)

@cython.boundscheck(False)
cdef int _scale_and_add_reg(np.ndarray[float, ndim=2] R,
np.ndarray[float, ndim=1] w, np.ndarray[float, ndim=3] Ms):
   cdef unsigned int mdim
   cdef int nb_add
   (nb_add,mdim,j)=np.shape(Ms)
   cdef unsigned int i
   for i from 0 <= i < nb_add:
   R+=w[i]*Ms[i]
   #for j in range(mdim):
   #for k in range(mdim):
   #R[j][k]+=w[i]*Ms[i][j][k]

   return 0

So here's my question if someone has time to answer it: Did I try
anything possible? Should I move along and deal with this bottleneck?
Or is there something else I didn't think of?

Thanks for reading, keep up the great work!

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Vectorize or rewrite function to work with array inputs?

2011-01-31 Thread eat

Hi,

On Mon, Jan 31, 2011 at 5:15 PM,  wrote:

> I have several functions like the example below that I would like to make
> compatible with array inputs. The problem is the conditional statements give
> a *ValueError: The truth value of an array with more than one element is
> ambiguous. Use a.any() or a.all()*. I can use numpy.vectorize, but if
> possible I'd prefer to rewrite the function. Does anyone have any advice the
> best way to modify the code to accept array inputs? Thanks in advance for
> any assistance.


If I understod your question correctly, then air_gamma could be coded as:
def air_gamma_0(t, far=0.0):
"""
Specific heat ratio (gamma) of Air/JP8
t - static temperature, Rankine
[far] - fuel air ratio [- defaults to 0.0 (dry air)]
air_gamma - specific heat ratio
"""
if far< 0.:
return NAN
elif far < 0.005:
ag= air_gamma_1(t)
ag[np.logical_or(t< 379., t> 4731.)]= NAN
return ag
elif far< 0.069:
ag= air_gamma_2(t, far)
ag[np.logical_or(t< 699., t> 4731.)]= NAN
return ag
else:
return NAN
Rest of the code is in the attachment.


My two cents,
eat

>
>
>
> NAN = float('nan')
>
> def air_gamma(t, far=0.0):
> """
> Specific heat ratio (gamma) of Air/JP8
> t - static temperature, Rankine
> [far] - fuel air ratio [- defaults to 0.0 (dry air)]
> air_gamma - specific heat ratio
> """
> if far < 0.:
> return NAN
> elif far < 0.005:
> if t < 379. or t > 4731.:
> return NAN
> else:
> air_gamma = -3.472487e-22 * t ** 6. + 6.218811e-18 * t ** 5. -
> 4.428098e-14 * t ** 4. + 1.569889e-10 * t ** 3. - 0.002753524 * t ** 2.
> + 0.0001684666 * t + 1.368652
> elif far < 0.069:
> if t < 699. or t > 4731.:
> return NAN
> else:
> a6 = 4.114808e-20 * far ** 3. - 1.644588e-20 * far ** 2. +
> 3.103507e-21 * far - 3.391308e-22
> a5 = -6.819015e-16 * far ** 3. + 2.773945e-16 * far ** 2. -
> 5.469399e-17 * far + 6.058125e-18
> a4 = 4.684637e-12 * far ** 3. - 1.887227e-12 * far ** 2. +
> 3.865306e-13 * far - 4.302534e-14
> a3 = -0.0001700602 * far ** 3. + 0.6593809 * far **
> 2. - 0.1392629 * far + 1.520583e-10
> a2 = 0.3431136 * far ** 3. - 0.1248285 * far ** 2. +
> 0.02688007 * far - 0.002651616
> a1 = -0.03792449 * far ** 3. + 0.01261025 * far ** 2. -
> 0.002676877 * far + 0.0001580424
> a0 = 13.65379 * far ** 3. - 3.311225 * far ** 2. + 0.3573201 *
> far + 1.372714
> air_gamma = a6 * t ** 6. + a5 * t ** 5. + a4 * t ** 4. + a3 * t
> ** 3. + a2 * t ** 2. + a1 * t + a0
> elif far >= 0.069:
> return NAN
> else:
> return NAN
> return air_gamma
>
> David Parker
> Chromalloy - TDAG
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


air_gamma.py
Description: Binary data
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Vectorize or rewrite function to work with array inputs?

2011-02-01 Thread eat

Hi David,

(I think the discussion has divergegd, but newer mind ;-)

I didn't treat far as a vector (yet, but for sure it's doable). Anyway befor
going on more details I'll suggest you*ll study my new attachment. As far I
can see, you just have 'simple' polynomial evaluatons, which I have
refactored such a way that you just need to provide the coefficients arrays.
There is no need to write explicitly polynomial evaluation in the code. My
opinion is that the computational side (if your functions really are like
shown sofar) will be pretty straightforward. Then it boils down how to
handle the coefficient arrays reasonable (perhaps some kind of lightweigt
'database' for them ;-).

Please feel free to provide any more information.

Regards,
eat

On Tue, Feb 1, 2011 at 10:20 PM,  wrote:

> I'm not sure I need to dive into cython or C for this - performance is not
> an issue for my problem - I just want a flexible function that will accept
> scalars or arrays.
>
> Both Sebastian's and eat's suggestions show using indexing to handle the
> conditional statements in the original function. The problem I'm having
> implementing this is in getting the input arguments and outputs to a common
> array size. Here's how I can do this but it seems ugly:
>
> # t and far are function arguments which may be scalars or arrays
> # ag is the output array
> # need to make everything array with common length
> t = np.array(t, ndmin=1)# Convert t to an array
> far = np.array(far, ndmin=1)# Convert far to an array
> ag = t*far*np.nan# Make an output array of the
> proper length using broadcasting rules
> t = np.zeros_like(ag)+t# Expand t to the length of the
> output array
> far = np.zeros_like(ag)+far# Expand far to the length of the output
> array
>
> Now with all arrays the same length I can use indexing with logical
> statements:
> ag[far<0.005] = -3.472487e-22 * t ** 6. + 6.218811e-18 * t ** 5. -
> 4.428098e-14 * t ** 4. + \
> 1.569889e-10 * t ** 3. - 0.002753524 * t ** 2. +
> 0.0001684666 * t + 1.368652
>
> The resulting code looks like this:
> import numpy as np
>
> def air_gamma_dp(t, far=0.0):
> """
> Specific heat ratio (gamma) of Air/JP8
> t - static temperature, Rankine
> [far] - fuel air ratio [- defaults to 0.0 (dry air)]
> air_gamma - specific heat ratio
> """
> t = np.array(t, ndmin=1)
> far = np.array(far, ndmin=1)
> ag = t*far*np.nan
> t = np.zeros_like(ag)+t
> far = np.zeros_like(ag)+far
>
> far[(far<0.) | (far>0.069)] = np.nan
> t[(t < 379.) | (t > 4731.)] = np.nan
> ag[(far<0.005)] = -3.472487e-22 * t ** 6. + 6.218811e-18 * t ** 5. -
> 4.428098e-14 * t ** 4. +
>1.569889e-10 * t ** 3. - 0.002753524 * t ** 2. +
> 0.0001684666 * t + 1.368652
> t[(t < 699.) | (t > 4731.)] = np.nan
> a6 = 4.114808e-20 * far ** 3. - 1.644588e-20 * far ** 2. + 3.103507e-21
> * far - 3.391308e-22
> a5 = -6.819015e-16 * far ** 3. + 2.773945e-16 * far ** 2. -
> 5.469399e-17 * far + 6.058125e-18
> a4 = 4.684637e-12 * far ** 3. - 1.887227e-12 * far ** 2. + 3.865306e-13
> * far - 4.302534e-14
> a3 = -0.0001700602 * far ** 3. + 0.6593809 * far ** 2. -
> 0.1392629 * far + 1.520583e-10
> a2 = 0.3431136 * far ** 3. - 0.1248285 * far ** 2. +
> 0.02688007 * far - 0.002651616
> a1 = -0.03792449 * far ** 3. + 0.01261025 * far ** 2. - 0.002676877 *
> far + 0.0001580424
> a0 = 13.65379 * far ** 3. - 3.311225 * far ** 2. + 0.3573201 * far +
> 1.372714
> ag[far>=0.005] = a6 * t ** 6. + a5 * t ** 5. + a4 * t ** 4. + a3 * t **
> 3. + a2 * t ** 2. + a1 * t + a0
> return ag
>
> I was hoping there was a more elegant way to do this.
>
> David Parker
> Chromalloy - TDAG
>
>
>
> From:John Salvatier 
> To:Discussion of Numerical Python 
> Date:02/01/2011 02:29 PM
>  Subject:Re: [Numpy-discussion] Vectorize or rewrite function to
> work with array inputs?
> Sent by:numpy-discussion-boun...@scipy.org
> --
>
>
>
> Have you thought about using cython to work with the numpy C-API (*
> http://wiki.cython.org/tutorials/numpy#UsingtheNumpyCAPI*<http://wiki.cython.org/tutorials/numpy#UsingtheNumpyCAPI>)?
> This will be fast, simple (you can mix and match Python and Cython).
>
> As for your specific issue: you can simply cast to all the inputs to numpy
> arrays (using asarray *
> http://docs.scipy.org/doc/numpy/reference/generated/numpy.asarray.html*<http://docs.scip

[Numpy-discussion] odd performance of sum?

2011-02-10 Thread eat

Hi,

Observing following performance:
In []: m= 1e5
In []: n= 1e2
In []: o= ones(n)
In []: M= randn(m, n)
In []: timeit M.sum(1)
10 loops, best of 3: 38.3 ms per loop
In []: timeit dot(M, o)
10 loops, best of 3: 21.1 ms per loop

In []: m= 1e2
In []: n= 1e5
In []: o= ones(n)
In []: M= randn(m, n)
In []: timeit M.sum(1)
100 loops, best of 3: 18.3 ms per loop
In []: timeit dot(M, o)
10 loops, best of 3: 21.2 ms per loop

One would expect sum to outperform dot with a clear marginal. Does there
exixts any 'tricks' to increase the performance of sum?

In []: sys.version
Out[]: '2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit
(Intel)]'
# installed binaries from http://python.org/
In []: np.version.version
Out[]: '1.5.1'
 # installed binaries from http://scipy.org/


Regards,
eat
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] odd performance of sum?

2011-02-10 Thread eat

Thanks Chuck,

for replying. But don't you still feel very odd that dot outperforms sum in
your machine? Just to get it simply; why sum can't outperform dot? Whatever
architecture (computer, cache) you have, it don't make any sense at all that
when performing significantly less instructions, you'll reach to spend more
time ;-).


Regards,
eat
On Thu, Feb 10, 2011 at 7:10 PM, Charles R Harris  wrote:

>
>
>  On Thu, Feb 10, 2011 at 8:29 AM, eat  wrote:
>
>> Hi,
>>
>> Observing following performance:
>> In []: m= 1e5
>> In []: n= 1e2
>> In []: o= ones(n)
>> In []: M= randn(m, n)
>> In []: timeit M.sum(1)
>> 10 loops, best of 3: 38.3 ms per loop
>> In []: timeit dot(M, o)
>> 10 loops, best of 3: 21.1 ms per loop
>>
>> In []: m= 1e2
>> In []: n= 1e5
>> In []: o= ones(n)
>> In []: M= randn(m, n)
>> In []: timeit M.sum(1)
>> 100 loops, best of 3: 18.3 ms per loop
>> In []: timeit dot(M, o)
>> 10 loops, best of 3: 21.2 ms per loop
>>
>> One would expect sum to outperform dot with a clear marginal. Does there
>> exixts any 'tricks' to increase the performance of sum?
>>
>>
>
> I'm not surprised, much depends on the version of ATLAS or MKL you are
> linked to. If you aren't linked to either and just using numpy's version
> then the results are a bit strange. With numpy development I get
>
> In [1]: m= 1e5
>
> In [2]: n= 1e2
>
> In [3]: o= ones(n)
>
> In [4]: M= randn(m, n)
>
> In [5]: timeit M.sum(1)
> 100 loops, best of 3: 19.2 ms per loop
>
> In [6]: timeit dot(M, o)
> 100 loops, best of 3: 15 ms per loop
>
> In [7]: m= 1e2
>
> In [8]: n= 1e5
>
> In [9]: o= ones(n)
>
> In [10]: M= randn(m, n)
>
> In [11]: timeit M.sum(1)
> 100 loops, best of 3: 17.4 ms per loop
>
> In [12]: timeit dot(M, o)
> 100 loops, best of 3: 14.2 ms per loop
>
> Chuck
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] odd performance of sum?

2011-02-10 Thread eat

Hi Robert,

On Thu, Feb 10, 2011 at 8:16 PM, Robert Kern  wrote:

> On Thu, Feb 10, 2011 at 11:53, eat  wrote:
> > Thanks Chuck,
> >
> > for replying. But don't you still feel very odd that dot outperforms sum
> in
> > your machine? Just to get it simply; why sum can't outperform dot?
> Whatever
> > architecture (computer, cache) you have, it don't make any sense at all
> that
> > when performing significantly less instructions, you'll reach to spend
> more
> > time ;-).
>
> These days, the determining factor is less often instruction count
> than memory latency, and the optimized BLAS implementations of dot()
> heavily optimize the memory access patterns.

Can't we have this as well with simple sum?

> Additionally, the number
> of instructions in your dot() probably isn't that many more than the
> sum(). The sum() is pretty dumb

But does it need to be?

> and just does a linear accumulation
> using the ufunc reduce mechanism, so (m*n-1) ADDs plus quite a few
> instructions for traversing the array in a generic manner. With fused
> multiply-adds, being able to assume contiguous data and ignore the
> numpy iterator overhead, and applying divide-and-conquer kernels to
> arrange sums, the optimized dot() implementations could have a
> comparable instruction count.

Couldn't sum benefit with similar logic?

> If you were willing to spend that amount of developer time and code
> complexity to make platform-specific backends to sum()

Actually I would, but I'm not competent at all in that detailed level (:,
But I'm willing to spend more on my own time for example for testing,
debugging, analysing various improvements and suggestions if such emerge.

> , you could make
> it go really fast, too. Typically, it's not all that important to make
> it worthwhile, though. One thing that might be worthwhile is to make
> implementations of sum() and cumsum() that avoid the ufunc machinery
> and do their iterations more quickly, at least for some common
> combinations of dtype and contiguity.
>
Well I'm allready perplexd before reaching that 'ufunc machinery', it's
actually anyway trivial (for us more mortal ;-) to figure out what's
happening with sum on fromnumeric.py!


Regards,
eat

>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
>   -- Umberto Eco
>  ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] odd performance of sum?

2011-02-10 Thread eat

Hi Pauli,

On Thu, Feb 10, 2011 at 8:31 PM, Pauli Virtanen  wrote:

> Thu, 10 Feb 2011 12:16:12 -0600, Robert Kern wrote:
> [clip]
> > One thing that might be worthwhile is to make
> > implementations of sum() and cumsum() that avoid the ufunc machinery and
> > do their iterations more quickly, at least for some common combinations
> > of dtype and contiguity.
>
> I wonder what is the balance between the iterator overhead and the time
> taken in the reduction inner loop. This should be straightforward to
> benchmark.
>
> Apparently, some overhead decreased with the new iterators, since current
> Numpy master outperforms 1.5.1 by a factor of 2 for this benchmark:
>
> In [8]: %timeit M.sum(1) # Numpy 1.5.1
> 10 loops, best of 3: 85 ms per loop
>
> In [8]: %timeit M.sum(1) # Numpy master
> 10 loops, best of 3: 49.5 ms per loop
>
> I don't think this is explainable by the new memory layout optimizations,
> since M is C-contiguous.
>
> Perhaps there would be room for more optimization, even within the ufunc
> framework?

I hope so. Please suggest if there's anything that I can do to further
advance this. (My C skills are allready bit rusty, but at any higher level
I'll try my best to contribute).


Thanks,
eat

>
> --
> Pauli Virtanen
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] odd performance of sum?

2011-02-10 Thread eat

Hi Robert,

On Thu, Feb 10, 2011 at 10:58 PM, Robert Kern  wrote:

> On Thu, Feb 10, 2011 at 14:29, eat  wrote:
> > Hi Robert,
> >
> > On Thu, Feb 10, 2011 at 8:16 PM, Robert Kern 
> wrote:
> >>
> >> On Thu, Feb 10, 2011 at 11:53, eat  wrote:
> >> > Thanks Chuck,
> >> >
> >> > for replying. But don't you still feel very odd that dot outperforms
> sum
> >> > in
> >> > your machine? Just to get it simply; why sum can't outperform dot?
> >> > Whatever
> >> > architecture (computer, cache) you have, it don't make any sense at
> all
> >> > that
> >> > when performing significantly less instructions, you'll reach to spend
> >> > more
> >> > time ;-).
> >>
> >> These days, the determining factor is less often instruction count
> >> than memory latency, and the optimized BLAS implementations of dot()
> >> heavily optimize the memory access patterns.
> >
> > Can't we have this as well with simple sum?
>
> It's technically feasible to accomplish, but as I mention later, it
> entails quite a large cost. Those optimized BLASes represent many
> man-years of effort

Yes I acknowledge this. But didn't they then  ignore them something simpler,
like sum (but which actually could benefit exactly similiar optimizations).

> and cause substantial headaches for people
> building and installing numpy.

I appreciate this. No doubt at all.

> However, they are frequently worth it
> because those operations are often bottlenecks in whole applications.
> sum(), even in its stupidest implementation, rarely is. In the places
> where it is a significant bottleneck, an ad hoc implementation in C or
> Cython or even FORTRAN for just that application is pretty easy to
> write.

But here I have to disagree; I'll think that at least I (if not even the
majority of numpy users) don't like (nor I'm be capable/ or have enough
time/ resources) go to dwell such details. I'm sorry but I'll have to
restate that it's quite reasonable to expect that sum outperforms dot in any
case. Lets now to start make such movements, which enables sum to outperform
dot.

> You can gain speed by specializing to just your use case, e.g.
> contiguous data, summing down to one number, or summing along one axis
> of only 2D data, etc. There's usually no reason to try to generalize
> that implementation to put it back into numpy.

Yes, I would really like to specialize into my case, but 'without going out
the python realm.'


Thanks,
eat

>
> >> Additionally, the number
> >> of instructions in your dot() probably isn't that many more than the
> >> sum(). The sum() is pretty dumb
> >
> > But does it need to be?
>
> As I also allude to later in my email, no, but there are still costs
> involved.
>
> >> and just does a linear accumulation
> >> using the ufunc reduce mechanism, so (m*n-1) ADDs plus quite a few
> >> instructions for traversing the array in a generic manner. With fused
> >> multiply-adds, being able to assume contiguous data and ignore the
> >> numpy iterator overhead, and applying divide-and-conquer kernels to
> >> arrange sums, the optimized dot() implementations could have a
> >> comparable instruction count.
> >
> > Couldn't sum benefit with similar logic?
>
> Etc. I'm not going to keep repeating myself.
>
> --
>  Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
>   -- Umberto Eco
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] odd performance of sum?

2011-02-10 Thread eat

Hi,

On Fri, Feb 11, 2011 at 12:08 AM, Robert Kern  wrote:

>  On Thu, Feb 10, 2011 at 15:32, eat  wrote:
> > Hi Robert,
> >
> > On Thu, Feb 10, 2011 at 10:58 PM, Robert Kern 
> wrote:
> >>
> >> On Thu, Feb 10, 2011 at 14:29, eat  wrote:
> >> > Hi Robert,
> >> >
> >> > On Thu, Feb 10, 2011 at 8:16 PM, Robert Kern 
> >> > wrote:
> >> >>
> >> >> On Thu, Feb 10, 2011 at 11:53, eat  wrote:
> >> >> > Thanks Chuck,
> >> >> >
> >> >> > for replying. But don't you still feel very odd that dot
> outperforms
> >> >> > sum
> >> >> > in
> >> >> > your machine? Just to get it simply; why sum can't outperform dot?
> >> >> > Whatever
> >> >> > architecture (computer, cache) you have, it don't make any sense at
> >> >> > all
> >> >> > that
> >> >> > when performing significantly less instructions, you'll reach to
> >> >> > spend
> >> >> > more
> >> >> > time ;-).
> >> >>
> >> >> These days, the determining factor is less often instruction count
> >> >> than memory latency, and the optimized BLAS implementations of dot()
> >> >> heavily optimize the memory access patterns.
> >> >
> >> > Can't we have this as well with simple sum?
> >>
> >> It's technically feasible to accomplish, but as I mention later, it
> >> entails quite a large cost. Those optimized BLASes represent many
> >> man-years of effort
> >
> > Yes I acknowledge this. But didn't they then  ignore them something
> simpler,
> > like sum (but which actually could benefit exactly similiar
> optimizations).
>
> Let's set aside the fact that the people who optimized the
> implementation of dot() (the authors of ATLAS or the MKL or whichever
> optimized BLAS library you linked to) are different from those who
> implemented sum() (the numpy devs). Let me repeat a reason why one
> would put a lot of effort into optimizing dot() but not sum():
>
> """
> >> However, they are frequently worth it
> >> because those operations are often bottlenecks in whole applications.
> >> sum(), even in its stupidest implementation, rarely is.
> """
>
> I don't know if I'm just not communicating very clearly, or if you
> just reply to individual statements before reading the whole email.

You are communicating very well. It's me who's not communicating so well.

>
> >> and cause substantial headaches for people
> >> building and installing numpy.
> >
> > I appreciate this. No doubt at all.
> >>
> >> However, they are frequently worth it
> >> because those operations are often bottlenecks in whole applications.
> >> sum(), even in its stupidest implementation, rarely is. In the places
> >> where it is a significant bottleneck, an ad hoc implementation in C or
> >> Cython or even FORTRAN for just that application is pretty easy to
> >> write.
> >
> > But here I have to disagree; I'll think that at least I (if not even the
> > majority of numpy users) don't like (nor I'm be capable/ or have enough
> > time/ resources) go to dwell such details.
>
> And you think we have the time and resources to do it for you?

My intention was not to suggest anything like that.

>
> > I'm sorry but I'll have to
> > restate that it's quite reasonable to expect that sum outperforms dot in
> any
> > case.
>
> You don't optimize a function just because you are capable of it. You
> optimize a function because it is taking up a significant portion of
> total runtime in your real application. Anything else is a waste of
> time.
>
> > Lets now to start make such movements, which enables sum to outperform
> > dot.
>
> Sorry, you don't get to volunteer anyone's time or set anyone's
> priorities but your own. There are some sensible things one could do
> to optimize sums or even general ufunc reductions, as outlined my
> Mark, Pauli and myself, but please stop using the accelerated-BLAS
> dot() as your benchmark. It's a completely inappropriate comparison.

OK. Lets compare sum to itself:
In []: M= randn(1e5, 1e2)
In []: timeit M.sum(0)
10 loops, best of 3: 169 ms per loop
In []: timeit M.sum(1)
10 loops, best of 3: 37.5 ms per loop
In []: timeit M.sum()
10 loops, best of 3: 18.1 ms per loop
In []: timeit s

Re: [Numpy-discussion] odd performance of sum?

2011-02-12 Thread eat

Hi Sturla,

On Sat, Feb 12, 2011 at 5:38 PM, Sturla Molden  wrote:

> Den 10.02.2011 16:29, skrev eat:
> > One would expect sum to outperform dot with a clear marginal. Does
> > there exixts any 'tricks' to increase the performance of sum?
>
First of all, thanks for you still replying. Well, I'm still a little bit
unsure how I should proceed with this discussion... I may have used bad
wordings and created unneccessary mayhyem with my original question (:.
Trust me, I'm only trying to discuss here  with a positive criticism in my
mind.

Now, I'm not pretending to know what kind of a person a 'typical' numpy user
is. But I'm assuming that there just exists more than me with roughly
similar questions in their (our) minds and who wish to utilize numpy more
 'pythonic; all batteries included' way. Ocassionally I (we) may ask really
stupid questions, but please beare with us.

Said that, I'm still very confident that (from a users point of view)
there's some real substance on the issue I addressed.

> I see that others have ansvered already. The ufunc np.sum is not going
> going to beat np.dot. You are racing the heavy machinery of NumPy (array
> iterators, type chekcs, bound checks, etc.) against level-3 BLAS routine
> DGEMM, the most heavily optimized numerical kernel ever written.

Fair enough.

> Also
> beware that computation is much cheaper than memory access.

Sure, that's exactly where I expected the performance boost to emerge.

> Although
> DGEMM does more arithmetics, and even is O(N3) in that respect, it is
> always faster except for very sparse arrays. If you need fast loops, you
> can always write your own Fortran or C, and even insert OpenMP pragmas.

That's a very important potential, but surely not all numpy users are
expected to master that ;-)

> But don't expect that to beat optimized high-level BLAS kernels by any
> margin. The first chapters of "Numerical Methods in Fortran 90" might be
> worth reading. It deals with several of these issues, including
> dimensional expansion, which is important for writing fast numerical
> code -- but not intuitively obvious. "I expect this to be faster because
> it does less work" is a fundamental misconception in numerical
> computing. Whatever cause less traffic on the memory BUS (the real
> bottleneck) will almost always be faster, regardless of the amount of
> work done by the CPU.

And I'm totally aware of it, and actually it was exactly the original
intended logic of my question: "how bout if the sum could follow the steps
of dot; then, since less instructions it must be bounded below of
the execution time of dot". But as R. Kern gently pointed out allready it's
not fruitfull enough avenue to proceed. And I'm able to live with that.

Regards,
eat

> A good advice is to use high-level BLAS whenever
> you can. The only exception, as mentioned, is when matrices get very
> sparse.
>
> Sturla
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] OT: performance in C extension; OpenMP, or SSE ?

2011-02-15 Thread eat

Hi,

On Tue, Feb 15, 2011 at 5:50 PM, Sebastian Haase wrote:

> Hi,
> I assume that someone here could maybe help me, and I'm hoping it's
> not too much off topic.
> I have 2 arrays of 2d point coordinates and would like to calculate
> all pairwise distances as fast as possible.
> Going from Python/Numpy to a (Swigged) C extension already gave me a
> 55x speedup.
> (.9ms vs. 50ms for arrays of length 329 and 340).
>
With my very modest machine (Intel Dual CPU E2200, 2.2 Ghz) utilizing
scipy.spatial.distance.pdist will take some 1.5 ms for such arrays. Even the
simple linear algebra based full matrix calculations can be done less than 5
ms.


My two cents,
eat

> I'm using gcc on Linux.
> Now I'm wondering if I could go even faster !?
> My hope that the compiler might automagically do some SSE2
> optimization got disappointed.
> Since I have a 4 core CPU I thought OpenMP might be an option;
> I never used that, and after some playing around I managed to get
> (only) 50% slowdown(!) :-(
>
> My code in short is this:
> (My SWIG typemaps use obj_to_array_no_conversion() from numpy.i)
> --- --
> void dists2d(
>   double *a_ps, int nx1, int na,
>   double *b_ps, int nx2, int nb,
>   double *dist, int nx3, int ny3)  throw (char*)
> {
>  if(nx1 != 2)  throw (char*) "a must be of shape (n,2)";
>  if(nx2 != 2)  throw (char*) "b must be of shape (n,2)";
>  if(nx3 != nb || ny3 != na)throw (char*) "c must be of shape (na,nb)";
>
>  double *dist_;
>  int i, j;
>
> #pragma omp parallel private(dist_, j, i)
>  {
> #pragma omp for nowait
>for(i=0;i  {
>//num_threads=omp_get_num_threads();  --> 4
>dist_ = dist+i*nb; // dists_  is  only
> introduced for OpenMP
>for(j=0;j  {
>*dist_++  = sqrt( sq(a_ps[i*nx1]   - b_ps[j*nx2]) +
>  sq(a_ps[i*nx1+1] -
> b_ps[j*nx2+1]) );
>  }
>  }
>  }
> }
> --- --
> There is probably a simple mistake in this code - as I said I never
> used OpenMP before.
> It should be not too difficult to use OpenMP correctly here
>  or -  maybe better -
> is there a simple SSE(2,3,4) version that might be even better than
> OpenMP... !?
>
> I supposed, that I did not get the #pragma omp lines right - any idea ?
> Or is it in general not possible to speed this kind of code up using OpenMP
> !?
>
> Thanks,
> Sebastian Haase
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Taking a large number of dot products at once

2011-03-03 Thread eat

Hi,

On Fri, Mar 4, 2011 at 8:19 AM, Daniel Hyams  wrote:

> This is probably so easy, I'm embarrassed to ask it...but I've been casting
> around trying things to no avail for the last hour and a half, so here
> goes...
>
> I have a lot of dot products to take.  The length-3 vectors that I want to
> dot are stacked in a 2D array like this:
>
> U = [u1 u2 u3]
>
> and
>
> V = [v1 v2 v3]
>
> So both of these arrays, are, say, 3x100 each.  I just want to take the dot
> product of each of the corresponding vectors, so that the result is
>
> [u1.v1 u2.v2  u3.v3 ]
>
> which would be a 1x100 array in this case.
>
> Which function do I need to use?  I thought tensordot() was the one, but I
> couldn't make it workpure user error I'm sure.
>
No function needed for this case, just:
In []: x= rand(3, 7)
In []: y= rand(3, 7)
In []: d= (x* y).sum(0)
In [490]: d
Out[490]:
array([ 1.25404683,  0.19113117,  1.37267133,  0.74219888,  1.55296562,
0.15264303,  0.72039922])
In [493]: dot(x[:, 0].T, y[:, 0])
Out[493]: 1.2540468282421895

Regards,
eat

>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] view 1d array as overlapping segments?

2011-03-07 Thread eat

Hi,

On Mon, Mar 7, 2011 at 5:01 PM, Neal Becker  wrote:

> reshape can view a 1d array as non-overlapping segments.
>
> Is there a convenient way to view a 1d array as a 2d array of overlapping
> segments?
>
> nonoverlapping:
> l: segment length
> k: overlap
> u is the 1d array
> v is a 2d array
>
> v[i] = u[l*i:(l+1)*i]
>
> overlapping:
> v[i] = u[l*i:(l+1)*i+k]
>
In case actually  v[i]= u[i* l: (i+ 1)* l+ k], then this may be useful
from numpy.lib.stride_tricks import as_strided as ast

def os(a, l, k= 0):
shape, strides= (a.shape[0]- l+ 1, l+ k), a.strides* 2
return ast(a, shape= shape, strides= strides)

if __name__ == '__main__':
import numpy as np
a= np.arange(7, dtype= np.int8)
print os(a, 3)
# [[0 1 2]
#  [1 2 3]
#  [2 3 4]
#  [3 4 5]
#  [4 5 6]]
print os(a, 3, 2)
# [[ 0  1  2  3  4]
#  [ 1  2  3  4  5]
#  [ 2  3  4  5  6]
#  [ 3  4  5  6  0]  # last item garbage
#  [ 4  5  6  0 34]] # 2 last items garbage

My two cents,
eat

> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] avoid a line...

2011-03-17 Thread eat

Hi,

On Thu, Mar 17, 2011 at 9:36 AM, dileep kunjaai wrote:

> Dear sir,
>  I am try to read a file of the following format, I want to avoid the first
> line and read the remaining as 'float' .
> Please help me...
>
>
> RainWindTempPrSal
> 0.11.10.020.2   0.2
> 0.50.   0.  0.4   0.8
> 0.55.51.50.5   1.5
> 3.50.51.55.0   2.6
> 5.14.13.22.3   1.5
> 4.40.91.52.2.3
>
You may use loadtxt(), with skiprows argument. (See more on
http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html).

Regards,
eat

>
> --
> DILEEPKUMAR. R
> J R F, IIT DELHI
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Norm of array of vectors

2011-03-17 Thread eat

Hi,

On Thu, Mar 17, 2011 at 10:44 AM, Andrey N. Sobolev  wrote:

> Dear all,
>
> Sorry if that's a noob question, but anyway. I have several thousands of
> vectors stacked in 2d array. I'd like to get new array containing
> Euclidean norms of these vectors and get the vector with minimal norm.
>
> Is there more efficient way to do this than
> argmin(array([sqrt(dot(x,x)) for x in vec_array]))?
>
Try
argmin(sum(vec_array** 2, 0)** 0.5)

Regards,
eat

>
> Thanks in advance.
> Andrey.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

94 matches

Mail list logo