Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread josef . pktd
On Wed, Jan 25, 2012 at 12:03 AM, Charles R Harris wrote: > > > On Tue, Jan 24, 2012 at 4:21 PM, Kathleen M Tacina > wrote: >> >> I found something similar, with a very simple example. >> >> On 64-bit linux, python 2.7.2, numpy development version: >> >> In [22]: a = 4000*np.ones((1024,1024),dtyp

Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread Charles R Harris
On Tue, Jan 24, 2012 at 4:21 PM, Kathleen M Tacina < kathleen.m.tac...@nasa.gov> wrote: > ** > I found something similar, with a very simple example. > > On 64-bit linux, python 2.7.2, numpy development version: > > In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32) > > In [23]: a.mean() > Ou

Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread josef . pktd
On Tue, Jan 24, 2012 at 7:21 PM, eat wrote: > Hi > > On Wed, Jan 25, 2012 at 1:21 AM, Kathleen M Tacina < > kathleen.m.tac...@nasa.gov> wrote: > >> ** >> I found something similar, with a very simple example. >> >> On 64-bit linux, python 2.7.2, numpy development version: >> >> In [22]: a = 4000*

Re: [Numpy-discussion] numpy.percentile multiple arrays

2012-01-24 Thread Olivier Delalleau
Note that if you are ok with an approximate solution, and you can assume your data is somewhat shuffled, a simple online algorithm that uses no memory consists in: - choosing a small step size delta - initializing your percentile p to a more or less random value (a meaningful guess is better though

Re: [Numpy-discussion] numpy.percentile multiple arrays

2012-01-24 Thread questions anon
thanks for your responses, because of the size of the dataset I will still end up with the memory error if I calculate the median for each file, additionally the files are not all the same size. I believe this memory problem will still arise with the cumulative distribution calculation and not sure

Re: [Numpy-discussion] numpy.percentile multiple arrays

2012-01-24 Thread Brett Olsen
On Tue, Jan 24, 2012 at 6:22 PM, questions anon wrote: > I need some help understanding how to loop through many arrays to calculate > the 95th percentile. > I can easily do this by using numpy.concatenate to make one big array and > then finding the 95th percentile using numpy.percentile but this

Re: [Numpy-discussion] Fix for ticket #1973

2012-01-24 Thread Mark Wiebe
On Mon, Jan 16, 2012 at 8:14 AM, Charles R Harris wrote: > > > On Mon, Jan 16, 2012 at 8:52 AM, Charles R Harris < > charlesr.har...@gmail.com> wrote: > >> >> >> On Mon, Jan 16, 2012 at 8:37 AM, Bruce Southey wrote: >> >>> ** >>> On 01/14/2012 04:31 PM, Charles R Harris wrote: >>> >>> I've put up

Re: [Numpy-discussion] Unexpected behavior with np.min_scalar_type

2012-01-24 Thread Mark Wiebe
On Tue, Jan 24, 2012 at 7:29 AM, Kathleen M Tacina < kathleen.m.tac...@nasa.gov> wrote: > ** > I was experimenting with np.min_scalar_type to make sure it worked as > expected, and found some unexpected results for integers between 2**63 and > 2**64-1. I would have expected np.min_scalar_type(2**

Re: [Numpy-discussion] numpy.percentile multiple arrays

2012-01-24 Thread Marc Shivers
This is probably not the best way to do it, but I think it would work: Your could take two passes through your data, first calculating and storing the median for each file and the number of elements in each file. From those data, you can get a lower bound on the 95th percentile of the combined da

Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran

2012-01-24 Thread Mark Wiebe
2012/1/21 Ondřej Čertík > > > Let me know if you figure out something. I think the "mask" thing is > quite slow, but the problem is that it needs to be there, to catch > overflows (and it is there in Fortran as well, see the > "where" statement, which does the same thing). Maybe there is some >

Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread eat
Hi On Wed, Jan 25, 2012 at 1:21 AM, Kathleen M Tacina < kathleen.m.tac...@nasa.gov> wrote: > ** > I found something similar, with a very simple example. > > On 64-bit linux, python 2.7.2, numpy development version: > > In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32) > > In [23]: a.mean()

[Numpy-discussion] numpy.percentile multiple arrays

2012-01-24 Thread questions anon
I need some help understanding how to loop through many arrays to calculate the 95th percentile. I can easily do this by using numpy.concatenate to make one big array and then finding the 95th percentile using numpy.percentile but this causes a memory error when I want to run this on 100's of netcd

Re: [Numpy-discussion] einsum evaluation order

2012-01-24 Thread Mark Wiebe
On Tue, Jan 24, 2012 at 6:32 AM, Søren Gammelmark wrote: > Dear all, > > I was just looking into numpy.einsum and encountered an issue which might > be worth pointing out in the documentation. > > Let us say you wish to evaluate something like this (repeated indices a > summed) > > D[alpha, alphap

Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread David Warde-Farley
On Wed, Jan 25, 2012 at 01:12:06AM +0200, eat wrote: > Or does the results of calculations depend more on the platform? Floating point operations often do, sadly (not saying that this is the case here, but you'd need to try both versions on the same machine [or at least architecture/bit-width]/sa

Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread Kathleen M Tacina
I found something similar, with a very simple example. On 64-bit linux, python 2.7.2, numpy development version: In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32) In [23]: a.mean() Out[23]: 4034.16357421875 In [24]: np.version.full_version Out[24]: '2.0.0.dev-55472ca' But, a Windows XP

Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread eat
Hi, Oddly, but numpy 1.6 seems to behave more consistent manner: In []: sys.version Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]' In []: np.version.version Out[]: '1.6.0' In []: d= np.load('data.npy') In []: d.dtype Out[]: dtype('float32') In []: d.mean() Out[]: 30

Re: [Numpy-discussion] installing matplotlib in MacOs 10.6.8.

2012-01-24 Thread Samuel John
Sorry for the late answer. But at least for the record: If you are using eclipse, I assume you have also installed the eclipse plugin [pydev](http://pydev.org/). Is use it myself, it's good. Then you have to go to the preferences->pydev->PythonInterpreter and select the python version you want

Re: [Numpy-discussion] 'Advanced' save and restore operation

2012-01-24 Thread Samuel John
I know you wrote that you want "TEXT" files, but never-the-less, I'd like to point to http://code.google.com/p/h5py/ . There are viewers for hdf5 and it is stable and widely used. Samuel On 24.01.2012, at 00:26, Emmanuel Mayssat wrote: > After having saved data, I need to know/remember the da

Re: [Numpy-discussion] Unexpected behavior with np.min_scalar_type

2012-01-24 Thread Samuel John
I get the same results as you, Kathy. *surprised* (On OS X (Lion), 64 bit, numpy 2.0.0.dev-55472ca, Python 2.7.2. On 24.01.2012, at 16:29, Kathleen M Tacina wrote: > I was experimenting with np.min_scalar_type to make sure it worked as > expected, and found some unexpected results for integers

Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Samuel John
On 23.01.2012, at 11:23, David Warde-Farley wrote: >> a = numpy.array(numpy.random.randint(256,size=(500,972)),dtype='uint8') >> b = numpy.random.randint(500,size=(4993210,)) >> c = a[b] >> In [14]: c[100:].sum() >> Out[14]: 0 Same here. Python 2.7.2, 64bit, Mac OS X (Lion), 8GB RAM,

Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread David Warde-Farley
On Tue, Jan 24, 2012 at 01:02:44PM -0500, David Warde-Farley wrote: > On Tue, Jan 24, 2012 at 06:37:12PM +0100, Robin wrote: > > > Yes - I get exactly the same numbers in 64 bit windows with 1.6.1. > > Alright, so that rules out platform specific effects. > > I'll try and hunt the bug down when

[Numpy-discussion] Course "Python for Scientists and Engineers" in Chicago

2012-01-24 Thread Mike Müller
Course "Python for Scientists and Engineers" in Chicago === There will be a comprehensive Python course for scientists and engineers in Chicago end of February / beginning of March 2012. It consists of a 3-day intro and a 2-day advanced section.

Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread K . -Michael Aye
Thank you Bruce and all, I knew I was doing something wrong (should have read the mean method doc more closely). Am of course glad that's so easy understandable. But: If the error can get so big, wouldn't it be a better idea for the accumulator to always be of type 'float64' and then convert lat

Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread Zachary Pincus
> You have a million 32-bit floating point numbers that are in the > thousands. Thus you are exceeding the 32-bitfloat precision and, if you > can, you need to increase precision of the accumulator in np.mean() or > change the input dtype: a.mean(dtype=np.float32) # default and lacks precis

Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread Val Kalatsky
Just what Bruce said. You can run the following to confirm: np.mean(data - data.mean()) If for some reason you do not want to convert to float64 you can add the result of the previous line to the "bad" mean: bad_mean = data.mean() good_mean = bad_mean + np.mean(data - bad_mean) Val On Tue, Jan

Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread Zachary Pincus
On Jan 24, 2012, at 1:33 PM, K.-Michael Aye wrote: > I know I know, that's pretty outrageous to even suggest, but please > bear with me, I am stumped as you may be: > > 2-D data file here: > http://dl.dropbox.com/u/139035/data.npy > > Then: > In [3]: data.mean() > Out[3]: 3067.024383998 >

Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread Kathleen M Tacina
I have confirmed this on a 64-bit linux machine running python 2.7.2 with the development version of numpy. It seems to be related to using float32 instead of float64. If the array is first converted to a 64-bit float (via astype), mean gives an answer that agrees with your looped-calculation va

Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread Bruce Southey
On 01/24/2012 12:33 PM, K.-Michael Aye wrote: > I know I know, that's pretty outrageous to even suggest, but please > bear with me, I am stumped as you may be: > > 2-D data file here: > http://dl.dropbox.com/u/139035/data.npy > > Then: > In [3]: data.mean() > Out[3]: 3067.024383998 > > In [4]:

[Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread K . -Michael Aye
I know I know, that's pretty outrageous to even suggest, but please bear with me, I am stumped as you may be: 2-D data file here: http://dl.dropbox.com/u/139035/data.npy Then: In [3]: data.mean() Out[3]: 3067.024383998 In [4]: data.max() Out[4]: 3052.4343 In [5]: data.shape Out[5]: (1000,

Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread David Warde-Farley
On Tue, Jan 24, 2012 at 06:37:12PM +0100, Robin wrote: > Yes - I get exactly the same numbers in 64 bit windows with 1.6.1. Alright, so that rules out platform specific effects. I'll try and hunt the bug down when I have some time, if someone more familiar with the indexing code doesn't beat me

Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Robin
On Tue, Jan 24, 2012 at 6:24 PM, David Warde-Farley wrote: > On Tue, Jan 24, 2012 at 06:00:05AM +0100, Sturla Molden wrote: >> Den 23.01.2012 22:08, skrev Christoph Gohlke: >> > >> > Maybe this explains the win-amd64 behavior: There are a couple of places >> > in mtrand where array indices and siz

Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread David Warde-Farley
On Tue, Jan 24, 2012 at 06:00:05AM +0100, Sturla Molden wrote: > Den 23.01.2012 22:08, skrev Christoph Gohlke: > > > > Maybe this explains the win-amd64 behavior: There are a couple of places > > in mtrand where array indices and sizes are C long instead of npy_intp, > > for example in the randint

Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread David Warde-Farley
On Tue, Jan 24, 2012 at 09:15:01AM +, Robert Kern wrote: > On Tue, Jan 24, 2012 at 08:37, Sturla Molden wrote: > > On 24.01.2012 09:21, Sturla Molden wrote: > > > >> randomkit.c handles C long correctly, I think. There are different codes > >> for 32 and 64 bit C long, and buffer sizes are siz

Re: [Numpy-discussion] Strange error raised by scipy.special.erf

2012-01-24 Thread Nadav Horesh
I filed a ticket (#1590). Thank you for the verification. Nadav. From: numpy-discussion-boun...@scipy.org [numpy-discussion-boun...@scipy.org] On Behalf Of Pierre Haessig [pierre.haes...@crans.org] Sent: 24 January 2012 16:01 To: numpy-discussion@scip

[Numpy-discussion] Unexpected behavior with np.min_scalar_type

2012-01-24 Thread Kathleen M Tacina
I was experimenting with np.min_scalar_type to make sure it worked as expected, and found some unexpected results for integers between 2**63 and 2**64-1. I would have expected np.min_scalar_type(2**64-1) to return uint64. Instead, I get object. Further experimenting showed that the largest integ

[Numpy-discussion] einsum evaluation order

2012-01-24 Thread Søren Gammelmark
Dear all, I was just looking into numpy.einsum and encountered an issue which might be worth pointing out in the documentation. Let us say you wish to evaluate something like this (repeated indices a summed) D[alpha, alphaprime] = A[alpha, beta, sigma] * B[alphaprime, betaprime, sigma] * C[beta,

Re: [Numpy-discussion] Strange error raised by scipy.special.erf

2012-01-24 Thread Pierre Haessig
Le 22/01/2012 11:28, Nadav Horesh a écrit : > >>> special.erf(26.5) > 1.0 > >>> special.erf(26.6) > Traceback (most recent call last): > File "", line 1, in > special.erf(26.6) > FloatingPointError: underflow encountered in erf > >>> special.erf(26.7) > 1.0 > I can confirm this same behavi

Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Sturla Molden
On 24.01.2012 10:15, Robert Kern wrote: > There are two different uses of long that you need to distinguish. One > is for sizes, and one is for parameters and values. The sizes should > definitely be upgraded to npy_intp. The latter shouldn't; these should > remain as the default integer type of P

Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Robert Kern
On Tue, Jan 24, 2012 at 09:19, Sturla Molden wrote: > On 24.01.2012 10:16, Robert Kern wrote: > >> I'm sorry, what are you demonstrating there? > > Both npy_intp and C long are used for sizes and indexing. Ah, yes. I think Travis added the multiiter code to cont1_array(), which does broadcasting,

Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Sturla Molden
On 24.01.2012 10:16, Robert Kern wrote: > I'm sorry, what are you demonstrating there? Both npy_intp and C long are used for sizes and indexing. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/

Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Robert Kern
On Tue, Jan 24, 2012 at 08:47, Sturla Molden wrote: > The coding is also inconsistent, compare for example: > > https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L180 > > https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L201 I'm sorry, what are yo

Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Robert Kern
On Tue, Jan 24, 2012 at 08:37, Sturla Molden wrote: > On 24.01.2012 09:21, Sturla Molden wrote: > >> randomkit.c handles C long correctly, I think. There are different codes >> for 32 and 64 bit C long, and buffer sizes are size_t. > > distributions.c take C longs as parameters e.g. for the binomi

Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Sturla Molden
On 24.01.2012 06:32, Sturla Molden wrote: > Den 24.01.2012 06:00, skrev Sturla Molden: >> Both i and length could overflow here. It should overflow on >> allocation of more than 2 GB. There is also a lot of C longs in the >> internal state (line 55-105), as well as the other functions. > > The use

Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Sturla Molden
On 24.01.2012 09:21, Sturla Molden wrote: > randomkit.c handles C long correctly, I think. There are different codes > for 32 and 64 bit C long, and buffer sizes are size_t. distributions.c take C longs as parameters e.g. for the binomial distribution. mtrand.pyx correctly handles this, but it c

Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Sturla Molden
On 24.01.2012 06:32, Sturla Molden wrote: > The use of C long affects all the C and Pyrex source code in mtrand > module, not just mtrand.pyx. All of it is fubar on Win64. randomkit.c handles C long correctly, I think. There are different codes for 32 and 64 bit C long, and buffer sizes are size