from:"Jeff"

Re: [Numpy-discussion] ANN: Pandas 0.14.0 Release Candidate 1

2014-07-11 Thread Jeff

Matthew, we posted the release of 0.14.1 last night. Are these picked up 
and build here automatically? 
https://nipy.bic.berkeley.edu/scipy_installers/
thanks

Jeff
On Saturday, May 17, 2014 7:22:00 AM UTC-4, Jeff wrote:
>
> Hi,
>
> I'm pleased to announce the availability of the first release candidate of 
> Pandas 0.14.0.
> Please try this RC and report any issues here: Pandas Issues 
> <https://github.com/pydata/pandas/issues>
> We will be releasing officially in about 2 weeks or so.
>
> This is a major release from 0.13.1 and includes a small number of API 
> changes, several new features, enhancements, and 
> performance improvements along with a large number of bug fixes. 
>
> Highlights include:
>
>- Officially support Python 3.4
>- SQL interfaces updated to use sqlalchemy,
>- Display interface changes
>- MultiIndexing Using Slicers
>- Ability to join a singly-indexed DataFrame with a multi-indexed 
>DataFrame
>- More consistency in groupby results and more flexible groupby 
>specifications
>- Holiday calendars are now supported in CustomBusinessDay
>- Several improvements in plotting functions, including: hexbin, area 
>and pie plots.
>- Performance doc section on I/O operations
>
> Since there are some significant changes in the default way DataFrames are 
> displayed. I have put
> up a comment issue looking for some feedback here 
> <https://github.com/pydata/pandas/issues/7146>
>
> Here are the full whatsnew and documentation links:
>
> v0.14.0 Whatsnew 
> <http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html>
>
> v0.14.0 Documentation Page 
> <http://pandas-docs.github.io/pandas-docs-travis/>
>
> Source tarballs, and windows builds are available here:
>
> Pandas v0.14rc1 Release <https://github.com/pydata/pandas/releases>
>
> A big thank you to everyone who contributed to this release!
>
> Jeff
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] ANN: Pandas 0.14.0 Release Candidate 1

2014-07-25 Thread Jeff

How does the build trigger? If its just a matter of clicking on something 
when released. I think we can handle that :)

On Saturday, May 17, 2014 7:22:00 AM UTC-4, Jeff wrote:
>
> Hi,
>
> I'm pleased to announce the availability of the first release candidate of 
> Pandas 0.14.0.
> Please try this RC and report any issues here: Pandas Issues 
> <https://github.com/pydata/pandas/issues>
> We will be releasing officially in about 2 weeks or so.
>
> This is a major release from 0.13.1 and includes a small number of API 
> changes, several new features, enhancements, and 
> performance improvements along with a large number of bug fixes. 
>
> Highlights include:
>
>- Officially support Python 3.4
>- SQL interfaces updated to use sqlalchemy,
>- Display interface changes
>- MultiIndexing Using Slicers
>- Ability to join a singly-indexed DataFrame with a multi-indexed 
>DataFrame
>- More consistency in groupby results and more flexible groupby 
>specifications
>- Holiday calendars are now supported in CustomBusinessDay
>- Several improvements in plotting functions, including: hexbin, area 
>and pie plots.
>- Performance doc section on I/O operations
>
> Since there are some significant changes in the default way DataFrames are 
> displayed. I have put
> up a comment issue looking for some feedback here 
> <https://github.com/pydata/pandas/issues/7146>
>
> Here are the full whatsnew and documentation links:
>
> v0.14.0 Whatsnew 
> <http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html>
>
> v0.14.0 Documentation Page 
> <http://pandas-docs.github.io/pandas-docs-travis/>
>
> Source tarballs, and windows builds are available here:
>
> Pandas v0.14rc1 Release <https://github.com/pydata/pandas/releases>
>
> A big thank you to everyone who contributed to this release!
>
> Jeff
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] ANN: pandas v0.18.0rc1 - RELEASE CANDIDATE

2016-02-28 Thread Jeff


These are pre-releases. In other words, we would want the community to test 
out before an official release, and see if there are any show stoppers. The 
docs are setup for the official releases. These are not put into official 
channels at all (that is the point), e.g. not on PyPi, nor in the conda 
main channels. Only official releases will go there.

Generally we will try to do release candidates before major changes, but 
not before minor changes.

So the official release of 0.18.0 has not happened yet! (in fact going to 
do a v0.18.0rc2 next week).

We would love for you to test out!

Jeff




On Sunday, February 28, 2016 at 11:50:57 AM UTC-5, John E wrote:
>
> I hope this doesn't come across as a trivial, semantical question, but...
>
> The initial releases of the last 2  or so versions have been labelled as 
> "release candidates" but still say "We recommend that all
> users upgrade to this version."
>
> So this is a little confusing to me for using pandas in a production 
> environment.  "Release candidate" seems to suggest that you should wait for 
> 0.18.1, but the note unambiguously says not to wait.  So which 
> interpretation is recommended for a production environment?
>
>
> On Saturday, February 13, 2016 at 7:53:18 PM UTC-5, Jeff wrote:
>>
>> Hi,
>>
>> I'm pleased to announce the availability of the first release candidate 
>> of Pandas 0.18.0.
>> Please try this RC and report any issues here: Pandas Issues 
>> <https://github.com/pydata/pandas/issues>
>> We will be releasing officially in 1-2 weeks or so.
>>
>> **RELEASE CANDIDATE 1**
>>
>> This is a major release from 0.17.1 and includes a small number of API 
>> changes, several new features,
>> enhancements, and performance improvements along with a large number of 
>> bug fixes. We recommend that all
>> users upgrade to this version.
>>
>> Highlights include:
>>
>>- pandas >= 0.18.0 will no longer support compatibility with Python 
>>version 2.6 GH7718 <https://github.com/pydata/pandas/issues/7718> or 
>>version 3.3 GH11273 <https://github.com/pydata/pandas/issues/11273>
>>- Moving and expanding window functions are now methods on Series and 
>>DataFrame similar to .groupby like objects, see here 
>>
>> <http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0180-enhancements-moments>
>>.
>>- Adding support for a RangeIndex as a specialized form of the 
>>Int64Index for memory savings, see here 
>>
>> <http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0180-enhancements-rangeindex>
>>.
>>- API breaking .resample changes to make it more .groupby like, see 
>>here 
>>
>> <http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0180-breaking-resample>
>>- Removal of support for positional indexing with floats, which was 
>>deprecated since 0.14.0. This will now raise a TypeError, see here 
>>
>> <http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0180-float-indexers>
>>- The .to_xarray() function has been added for compatibility with the 
>> xarray 
>>package <http://xarray.pydata.org/en/stable/> see here 
>>
>> <http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0180-enhancements-xarray>
>>.
>>- Addition of the .str.extractall() method 
>>
>> <http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0180-enhancements-extractall>,
>>  
>>and API changes to the the .str.extract() method 
>>
>> <http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0180-enhancements-extract>,
>>  
>>and the .str.cat() method 
>>
>> <http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0180-enhancements-strcat>
>>- pd.test() top-level nose test runner is available GH4327 
>><https://github.com/pydata/pandas/issues/4327>
>>
>> See the Whatsnew 
>> <http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html> for much 
>> more information. 
>>
>> Best way to get this is to install via conda 
>> <http://pandas-docs.github.io/pandas-docs-travis/install.html#installing-pandas-with-anaconda>
>>  from 
>> our development channel. Builds for osx-64,linux-64,win-64 for Python 2.7 
>> and Python 3.5 are all available.
>>
>> conda install pandas=v0.18.0rc1 -c pandas
>>
>> Thanks to all who made this release happen. It is a very large release!
>>
>> Jeff
>>
>>___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] ANN: pandas v0.18.0rc1 - RELEASE CANDIDATE

2016-02-28 Thread Jeff

So you are probably reading sas7bdat, which was put in AFTER 0.18.0rc1 was 
cut (if you are reading xport format then you are good to go), otherwise 
you may want to wait a bit for 0.18.0rc2.



On Sunday, February 28, 2016 at 1:42:53 PM UTC-5, John E wrote:
>
> OK, thanks, I got it.  Although... I would consider pandas.pydata.org to 
> be a common end user gateway and if one starts there they will read "We 
> recommend that *all *users upgrade to this version."  And then if they 
> scroll down a short distance they will see a single line instruction for 
> installing via conda: "conda install pandas=v0.18.0rc1 -c pandas". 
>
> And also somewhat confusing to me about pandas.pydata.org is that looking 
> to the right, you have a choice of RC, dev, and previous releases, but 
> nothing that says something like "current, stable release".
>
> Anyways  quite possibly this is confusing only to me, and not others, 
> but I thought I'd mention it just in case.  FWIW.
>
> I've now installed 0.18.0rc1 and will try to test out some of the newer 
> features.  I'm really interested to see how well the SAS reader works (i.e. 
> how fast).  I hate SAS myself, but this would be a really, really nice 
> feature for my organization and likely increase adoption of python & pandas.
>
>
>
> On Sunday, February 28, 2016 at 12:03:45 PM UTC-5, Jeff wrote:
>>
>>
>> These are pre-releases. In other words, we would want the community to 
>> test out before an official release, and see if there are any show 
>> stoppers. The docs are setup for the official releases. These are not put 
>> into official channels at all (that is the point), e.g. not on PyPi, nor in 
>> the conda main channels. Only official releases will go there.
>>
>> Generally we will try to do release candidates before major changes, but 
>> not before minor changes.
>>
>> So the official release of 0.18.0 has not happened yet! (in fact going to 
>> do a v0.18.0rc2 next week).
>>
>> We would love for you to test out!
>>
>> Jeff
>>
>>
>>
>>
>> On Sunday, February 28, 2016 at 11:50:57 AM UTC-5, John E wrote:
>>>
>>> I hope this doesn't come across as a trivial, semantical question, but...
>>>
>>> The initial releases of the last 2  or so versions have been labelled as 
>>> "release candidates" but still say "We recommend that all
>>> users upgrade to this version."
>>>
>>> So this is a little confusing to me for using pandas in a production 
>>> environment.  "Release candidate" seems to suggest that you should wait for 
>>> 0.18.1, but the note unambiguously says not to wait.  So which 
>>> interpretation is recommended for a production environment?
>>>
>>>
>>> On Saturday, February 13, 2016 at 7:53:18 PM UTC-5, Jeff wrote:
>>>>
>>>> Hi,
>>>>
>>>> I'm pleased to announce the availability of the first release candidate 
>>>> of Pandas 0.18.0.
>>>> Please try this RC and report any issues here: Pandas Issues 
>>>> <https://github.com/pydata/pandas/issues>
>>>> We will be releasing officially in 1-2 weeks or so.
>>>>
>>>> **RELEASE CANDIDATE 1**
>>>>
>>>> This is a major release from 0.17.1 and includes a small number of API 
>>>> changes, several new features,
>>>> enhancements, and performance improvements along with a large number of 
>>>> bug fixes. We recommend that all
>>>> users upgrade to this version.
>>>>
>>>> Highlights include:
>>>>
>>>>- pandas >= 0.18.0 will no longer support compatibility with Python 
>>>>version 2.6 GH7718 <https://github.com/pydata/pandas/issues/7718> or 
>>>>version 3.3 GH11273 <https://github.com/pydata/pandas/issues/11273>
>>>>- Moving and expanding window functions are now methods on Series 
>>>>and DataFrame similar to .groupby like objects, see here 
>>>>
>>>> <http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0180-enhancements-moments>
>>>>.
>>>>- Adding support for a RangeIndex as a specialized form of the 
>>>>Int64Index for memory savings, see here 
>>>>
>>>> <http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0180-enhancements-rangeindex>
>>>>.
>>>>- API breaking .resample changes to make it more

[Numpy-discussion] List migration complete

2006-11-16 Thread Jeff Strunk

Good afternoon,

The list migration has completed successfully. [EMAIL PROTECTED] is 
the new address for this list.

Thank you,
Jeff
___
Numpy-discussion mailing list
[EMAIL PROTECTED]
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy install on mac os x 10.4

2006-12-31 Thread Jeff Whitaker

Erin Sheldon wrote:
> Hi All -
>
> You can do this quite simply with fink if you have
> the patience to wait for the compilations to finish.
> This works on my ppc mac  with
> XCode and fink installed (12/31/2006):
>
>   fink install scipy-py24
>   sudo apt-get install gettext-dev=0.10.40-25 gettext=0.10.40-25
>   fink install matplotlib-py24
>
> For more details see this page I set up:
> http://howdy.physics.nyu.edu/index.php/Numpy_For_Mac_Using_Fink
>
> Erin
>   
Erin:  Nice tutorial.  I recommend one extra step though - right after 
installing fink,  add 'unstable/main' to the 'Trees:' line in 
/sw/etc/fink.conf, and run 'fink selfupdate'.  That way you will get the 
latest versions of all the packages. 

Also, if you want the python 2.5 versions, substitute 'py25' for 'py24'.

-Jeff



-- 
Jeffrey S. Whitaker Phone : (303)497-6313
NOAA/OAR/CDC  R/PSD1FAX   : (303)497-6449
325 BroadwayBoulder, CO, USA 80305-3328

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy install on mac os x 10.4

2006-12-31 Thread Jeff Whitaker

Christopher Barker wrote:
> Erin Sheldon wrote:
>   
>> You can do this quite simply with fink 
>> 
>
> I've generally stayed away form fink, as it felt like kind of a separate 
> system within OS-X, rather than integrated -- kind of like cygwin.
>
> In particular, if you use Fink Python, can you:
>
> 1) Write apps that use the native GUI (not X), in particular, PyObjC, 
> wx-Mac, and TK-aqua.
>
> 2) Bundle up apps with Py2App, or otherwise create self contained 
> application bundles?
>
> 3) Universal (PPC+Intel) anything.
>
> Apart from "feel", I think those are the concrete reasons to use 
> MacPython, rather than fink. Please correct me if I'm got a wrong (or 
> outdated) impression.
>
> -Chris
>
>
>   

Chris:  The answer is No for all three.  But for some scientists like 
me, who are used to working on linux/unix workstations, fink works 
well.  I like being able to just run 'fink update scipy-py25 
matplotlib-py25' to get the latest versions of everything.  Also, being 
able to run stuff remotely via an ssh X11 tunnel to my office mac, and 
have the windows display back to my home mac, is a useful feature.

It all comes down to what you feel comfortable with.  Choice is good.

-Jeff

-- 
Jeffrey S. Whitaker Phone : (303)497-6313
NOAA/OAR/CDC  R/PSD1FAX   : (303)497-6449
325 BroadwayBoulder, CO, USA 80305-3328

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] custom accumlators

2007-01-05 Thread Jeff Whitaker

Matt Knox wrote:
> I made a post about this a while ago on the scipy-user mailing list, but I 
> didn't receive much of a response so I'm just throwing it out there again 
> (with more detail) in case it got overlooked.
>
> Basically, I'd like to be able to do accumulate operations with custom 
> functions. numpy.vectorize does not seem to provide an accumulate method with 
> the functions it returns. I'm hoping I don't have to write ufuncs in C to 
> accomplish this, but I fear that may the case. Either way, it would be nice 
> to know if it can, or cannot be done in an easy manner.
>
> I have lots of examples of where this kind of thing is useful, but I'll just 
> outline two for now.
>
> Assume the parameter x in all the functions below is a 1-d array
>
> -
> Example 1 - exponential moving average:
>
> # naive brute force method...
> def expmave(x, k):
> result = numpy.array(x, copy=True)
> for i in range(1, result.size):
>result[i] = result[i-1] + k * (result[i] - result[i-1])
> return result
>
> # slicker method (if it worked, which it doesn't)...
> def expmave(x, k):
> def expmave_sub(a, b):
> return a + k * (b - a)
> return numpy.vectorize(expmave_sub).accumulate(x)
> 
> -
> Example 2 - forward fill a masked array:
>
> # naive brute force method...
> def forward_fill(x):
> result = ma.array(x, copy=True)
> for i in range(1, result.size):
>if result[i] is ma.masked: result[i] = result[i-1]
> return result
>
> # slicker method (if it worked, which it doesn't)...
> def forward_fill(x):
> def forward_fill_sub(a, b):
>   if b is ma.masked: return a
>   else: return b
> return numpy.vectorize(forward_fill_sub).accumulate(x)
> -
>
> Is their a good way to do these kinds of things without python looping? Or is 
> that going to require writing a ufunc in C? Any help is greatly appreciated.
>
> Thanks,
>
> - Matt Knox
>   

Matt:  Here's a quick and dirty example of how to do this sort of thing 
in pyrex.  I do it all the time, and it works quite well.

# accumulator.pyx:
_doublesize = sizeof(double)
cdef extern from "Python.h":
int PyObject_AsWriteBuffer(object, void **rbuf, Py_ssize_t *len)
char *PyString_AsString(object)
def accumulator(object x, double k):
cdef Py_ssize_t buflen
cdef int ndim, i
cdef double *xdata
cdef void *xbuff
# make a copy by casting to an array of doubles.
x = x.astype('f8')
# if buffer api is supported, get pointer to data buffers.
if PyObject_AsWriteBuffer(x, &xbuff, &buflen) <> 0:
raise RuntimeError('object does not support buffer API')
ndim = buflen/_doublesize
xdata = xbuff
for i from 1 <= i < ndim:
xdata[i] = xdata[i-1] + k * (xdata[i] - xdata[i-1])
return x

# test.py
from accumulator import accumulator
from numpy import linspace
x = linspace(1.,10.,10)
k = 0.1
print x
x1 = accumulator(x,k)
print x1
def expmave(x, k):
result = x.copy()
for i in range(1, result.size):
result[i] = result[i-1] + k * (result[i] - result[i-1])
return result
x2 = expmave(x,k)
print x2 # should be the same as x1

# setup.py
import os
from distutils.core import setup, Extension
from Pyrex.Distutils import build_ext
setup(name = "accumulator",
  cmdclass  = {'build_ext': build_ext},
  keywords  = ["python","map projections","GIS","mapping","maps"],
  ext_modules = [Extension("accumulator",["accumulator.pyx"])])


to build, just do

python setup.py build_ext --inplace

then run test.py.

HTH,

-Jeff

-- 
Jeffrey S. Whitaker Phone : (303)497-6313
NOAA/OAR/CDC  R/PSD1FAX   : (303)497-6449
325 BroadwayBoulder, CO, USA 80305-3328

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Request for porting pycdf to NumPy

2007-02-09 Thread Jeff Whitaker

Eric Firing wrote:
> I have been using Jeff Whitaker's netcdf4 interface with good results.
>
> I could not find the web page for it on a NOAA site--I think NOAA is 
> reorganizing--but a search turned it up here.  Maybe Jeff can provide 
> a better link.
>
> http://netcdf4-python.googlecode.com/svn/trunk/docs/netCDF4-module.html
>
> Eric
>
Eric:

Yep, that's a link to the docs from the google code homepage

http://code.google.com/p/netcdf4-python/

AFAIK, this is the only one of the python interfaces that supports both 
the netcdf version 3 API and the new (still in alpha) version 4 API, 
which is built on top of HDF5.

As far as getting unidata to support  or bless an 'official' python 
interface, that's not going to happen.  They barely have enough staff to 
support the C, fortran and Java interfaces.

-Jeff

-- 
Jeffrey S. Whitaker Phone  : (303)497-6313
Meteorologist   FAX: (303)497-6449
NOAA/OAR/PSD  R/PSD1Email  : [EMAIL PROTECTED]
325 BroadwayOffice : Skaggs Research Cntr 1D-124
Boulder, CO, USA 80303-3328 Web: http://tinyurl.com/5telg

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] network outage

2007-03-13 Thread Jeff Strunk

Good evening,

Earlier this evening we had a network outage due to a network equipment 
malfunction. This outage prevented access to the Enthought and SciPy servers 
from about 8:45-10pm CDT. I have fixed the problem and everything should be 
back to normal.

I apologize for the inconvenience.
Jeff Strunk
IT Administrator
Enthought, Inc.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] New Trac feature: TracReSTMacro

2007-03-19 Thread Jeff Strunk

Good afternoon,

By request, I have installed the TracReSTMacro on the numpy, scipy, and 
scikits tracs. This plugin allows you to display ReST formatted text directly 
from svn.

For example, http://projects.scipy.org/neuroimaging/ni/wiki/ReadMe in its 
entirety is:
[[ReST(/ni/trunk/README)]]

Thank you,
Jeff Strunk
IT Administrator
Enthought, Inc.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] New Trac feature: TracReSTMacro

2007-03-20 Thread Jeff Strunk

On Tuesday 20 March 2007 11:54 am, David M. Cooke wrote:
> On Mon, Mar 19, 2007 at 12:54:51PM -0500, Jeff Strunk wrote:
> > Good afternoon,
> >
> > By request, I have installed the TracReSTMacro on the numpy, scipy, and
> > scikits tracs. This plugin allows you to display ReST formatted text
> > directly from svn.
> >
> > For example, http://projects.scipy.org/neuroimaging/ni/wiki/ReadMe in its
> > entirety is:
> > [[ReST(/ni/trunk/README)]]
>
> Hmm, I'm getting an Internal Server Error on
> http://projects.scipy.org/scipy/numpy/wiki/NumPyCAPI
> which has the content
> [[ReST(/numpy/trunk/doc/CAPI.txt)]]

The content was:
[[ReST(/numpy/trunk/numpy/doc/CAPI.txt)]]

It should have been:
[[ReST(/trunk/numpy/doc/CAPI.txt)]]

I fixed it in the database, and the page works.

Thanks,
Jeff

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] subversion site down

2007-03-25 Thread Jeff Strunk

Thank you for letting me know. I restarted the server at 5:30pm central.

-Jeff

On Sunday 25 March 2007 1:42 pm, Christopher Hanley wrote:
> Hi,
>
> It appears that the subversion server is down for numpy.
>
> Chris
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] interrupted svn updates

2007-05-11 Thread Jeff Strunk

Is this still causing trouble? I restarted apache about 20 minutes after you 
sent this.

Thanks,
Jeff

On Friday 11 May 2007 9:06 am, Christopher Hanley wrote:
> I had that problem this morning as well.  It appears to be a problem on
> the server side.
>
> Chris
>
> George Nurser wrote:
> > I'm trying to update numpy from svn.
> > My first try was very slow, but eventially produced 72 updated files;
> > gave message at end:
> > svn: REPORT request failed on '/svn/numpy/!svn/vcc/default'
> > svn: REPORT of '/svn/numpy/!svn/vcc/default': Could not read response
> > body: connection was closed by server. (http://svn.scipy.org)
> >
> > 2nd try produced 4 more & then died with above error
> >
> > 3rd & 4th just dies with above error.
> >
> > Any ideas -- is thw problem here or there?
> >
> > Many thanks, George Nurser.
> > ___
> > Numpy-discussion mailing list
> > Numpy-discussion@scipy.org
> > http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] interrupted svn updates

2007-05-11 Thread Jeff Strunk

On Friday 11 May 2007 3:46 pm, George Nurser wrote:
> Jeff,
>
> Sorry to bother you again on this, but it's certainly still giving the
> same problem.
> svn: REPORT request failed on '/svn/numpy/!svn/vcc/default'
> svn: REPORT of '/svn/numpy/!svn/vcc/default': Could not read response
> body: connection was closed by server. (http://svn.scipy.org)
>
> I tried a fresh checkout in a new directory, so the problems can't be here.
>
> Regards, George Nurser.

It appears to be a runaway moin process. The load average was pretty high. 
I'll keep a closer eye on it to see what might be happening.

I restarted apache to fix it temporarily.

Thanks,
Jeff

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Editing the SciPy wiki pages

2007-06-10 Thread Jeff Strunk

On Saturday 09 June 2007 3:01:02 pm rex wrote:
> While compiling Numpy using MKL9.1 is fresh in my mind, I'd like to
> update some things in the /Installing_SciPy/Linux page. I've registered
> as a user, but still am not allowed to edit the page. What's required?
>
> (I run a couple of Mediawiki sites, so it shouldn't take me long to do
> simple edits in MoinMoin.)
>
> -rex
>

Someone needs to add your wiki account to the EditorsGroup. What is your wiki 
username?

Thanks,
Jeff
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Editing the SciPy wiki pages

2007-06-10 Thread Jeff Strunk

On Sunday 10 June 2007 11:10:37 am Jeff Strunk wrote:
> On Saturday 09 June 2007 3:01:02 pm rex wrote:
> > While compiling Numpy using MKL9.1 is fresh in my mind, I'd like to
> > update some things in the /Installing_SciPy/Linux page. I've registered
> > as a user, but still am not allowed to edit the page. What's required?
> >
> > (I run a couple of Mediawiki sites, so it shouldn't take me long to do
> > simple edits in MoinMoin.)
> >
> > -rex
>
> Someone needs to add your wiki account to the EditorsGroup. What is your
> wiki username?
>
> Thanks,
> Jeff

That was incorrect. I checked the config file.

I just tested it by registering a new user, logging in, and 
editing  /Installing_SciPy/Linux .

A couple of possibilities for why it did not work for you are:
* You don't have cookies enabled.
* Registering doesn't automatically log you in.

Please let me know if either of these is the case or it continues to not let 
you edit.

Thanks,
Jeff
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] C++ Example

2012-03-04 Thread Jeff Whitaker

On 3/4/12 3:18 PM, Luis Pedro Coelho wrote:
> On Saturday, March 03, 2012 04:38:53 PM David Cournapeau wrote:
>> I don't think the code is comparable either - some of the stuff done
>> in the C code is done in the C++ code your are calling. The C code
>> could be significantly improved.
> Actually, that's not 100% accurate. The C code calls the same functions. Most
> of the extra cruft is that it needs to do all of this error checking and type-
> dispatch, while in C++ you can have RAII and templates.
>
>> Even more important here: almost none
>> of this code should be written anymore anyway, C++ or not. This is
>> really the kind of code that should be done in cython, as it is mostly
>> about wrapping C code into the python C API.
> At least last time I read up on it, cython was not able to do multi-type code,
> i.e., have code that works on arrays of multiple types. Does it support it
> now?
>
> Best,
Coming soon in version 0.16:

https://sage.math.washington.edu:8091/hudson/job/cython-docs/doclinks/1/src/userguide/fusedtypes.html

-Jeff
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] why Segmentation fault (core dumped)?

2012-05-26 Thread Jeff Whitaker


On 5/26/12 5:51 AM, Chao YUE wrote:

Dear all,

Previously I am able to run a script on our server but now it gave me 
a Segmentation fault (core dumped) error.
try I tried the script with same type of netcdf file but with much 
smaller file size and it works. So I think the error is related with 
memory stuff.
I guess it's because our system administrator make some change 
somewhere and that cause my problem?
the file size that cause the error to appear is 2.6G (in the script I 
read this file with NetCDF4 to numpy array and make some manipulation),

the small one without error is only 48M.


Chao:  Without seeing your script, there's not much I can say.  I 
suggest opening an issue at netcdf4-python.google.com, including your 
script as an attachment.  You'll probably have to post the data file 
somewhere (dropbox perhaps?) so I can run the script that triggers the 
segfault.


-Jeff
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Reading a big netcdf file

2011-08-04 Thread Jeff Whitaker


On 8/4/11 4:46 AM, Kiko wrote:

Hi, all.

Thank you very much for your replies.

I am obtaining some issues. If I use netcdf4-python or scipy.io.netcdf 
libraries:


In [4]: import netCDF4 as n4
In [5]: from scipy.io <http://scipy.io> import netcdf as nS
In [6]: import numpy as np
In [7]: gebco4 = n4.Dataset('GridOne.grd', 'r')
In [8]: gebcoS = nS.netcdf_file('GridOne.grd', 'r')

Now, if a do:

In [9]: z4 = gebco4.variables['z']

I got no problems and I have:

In [14]: type(z4); z4.shape; z4.size
Out[14]: 
Out[14]: (233312401,)
Out[14]: 233312401

But if I do:

In [15]: z4 = gebco4.variables['z'][:]

Traceback (most recent call last):
  File "", line 1, in 
  File "netCDF4.pyx", line 2466, in netCDF4.Variable.__getitem__ 
(netCDF4.c:22943)
  File "C:\Python26\lib\site-packages\netCDF4_utils.py", line 278, in 
_StartCountStride

n = len(range(beg,end,inc))
MemoryError

I got a memory error. 



Kiko:  I think the difference may be that when you read the data with 
netcdf4-python, it tries to unpack the short integers to a float32 
array, thereby using much more memory (more than you have available).  
scipy.io.netcdf is just returning you a numpy array of short integers.  
I bet if you do


gebco4.set_automaskandscale(False)

before reading the data from the getco4 variable, it will work, since 
this turns off the auto conversion to float32.


You'll have to do the conversion manually then, at which point you will 
may run out of memory anyway.



But if a select a smaller array I've got:

In [16]: z4 = gebco4.variables['z'][:1000]
In [17]: type(z4); z4.shape; z4.size
Out[17]: 
Out[17]: (1000,)
Out[17]: 1000

What's the difference between z4 as a netCDF4.Variable and as a 
numpy.ndarray?


the netcdf variable object just refers to the data in the file - only 
when you slice the object is the data read in and converted to a numpy 
array.


-Jeff


Now, if I use scipy.io.netcdf:

In [18]: zS = gebcoS.variables['z']
In [20]: type(zS); zS.shape
Out[20]: 
Out[20]: (233312401,)

In [21]: zS = gebcoS.variables['z'][:]
In [22]: type(zS); zS.shape
Out[22]: 
Out[22]: (233312401,)

What's the difference between zS as a scipy.io.netcdf.netcdf_variable 
and as a numpy.ndarray?

Why with scipy.io.netcdf I do not have a MemoryError?

Finally, if I do the following (maybe it's a silly thing do this) 
using Eric suggestions to clear the cache:


In [32]: zS = gebcoS.variables['z']
In [38]: timeit -n1 -r1 zSS = np.array(zS[:1]) # 100.000.000 
out of 233.312.401 because I've got a MemoryError

1 loops, best of 1: 73.1 s per loop

(If I use a copy, timeit -n1 -r1 zSS = np.array(zS[:1], 
copy=True), I get a MemoryError and I have to set the size to 
50.000.000 but it's quite fast).


Than you very much for your replies and excuse me if some questions 
are very basic.


Best regards.

***
The results of ncdump -h
netcdf GridOne {
dimensions:
side = 2 ;
xysize = 233312401 ;
variables:
double x_range(side) ;
x_range:units = "user_x_unit" ;
double y_range(side) ;
y_range:units = "user_y_unit" ;
short z_range(side) ;
z_range:units = "user_z_unit" ;
double spacing(side) ;
short dimension(side) ;
short z(xysize) ;
z:scale_factor = 1. ;
z:add_offset = 0. ;
z:node_offset = 0 ;

// global attributes:
:title = "GEBCO One Minute Grid" ;
:source = "1.02" ;
}

The file is publicly available from: 
http://www.gebco.net/data_and_products/gridded_bathymetry_data/




___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Copy netcdf attributes between different files

2011-11-14 Thread Jeff Whitaker

On 11/14/11 10:04 AM, Giovanni Plantageneto wrote:
> Hi everybody,
> I am using netCDF4 library to read and write from netcdf files. I
> would like to copy all the attributes of one file to another one, in a
> way like this:
>
> ---
>
> from netCDF4 import Dataset as ncdf
>
> file1 = ncdf('file1.nc', mode='r', format='NETCDF4_CLASSIC')
> ...
> file2 = ncdf('file1.nc', mode='w', format='NETCDF4_CLASSIC')
> for att in file1.ncattrs():
> file2.att = file1.getncatt(att)
> ...
> file1.close()
> file2.close()
>
> ---
>
> But this will not work as only one attribute named "att" in file2 will
> be created. How should I do this?
> Thanks.
>
Try this:

for att in file1.ncattrs():
setattr(file2,att) = getattr(file1,att)

-Jeff


-- 
Jeffrey S. Whitaker Phone  : (303)497-6313
Meteorologist   FAX: (303)497-6449
NOAA/OAR/PSD  R/PSD1Email  : jeffrey.s.whita...@noaa.gov
325 BroadwayOffice : Skaggs Research Cntr 1D-113
Boulder, CO, USA 80303-3328 Web: http://tinyurl.com/5telg

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] [pydata] ANN: pandas 0.13.1 released

2014-02-09 Thread Jeff Reback

> Hello,
> 
> This is a minor release from 0.13.0 and includes a small number of API 
> changes, several new features, enhancements, and 
> performance improvements along with a large number of bug fixes. 
> 
> We recommend that all users upgrade to this version.
> 
> Highlights include:
> 
> - Added infer_datetime_format keyword to read_csv/to_datetime to allow 
> speedups for homogeneously formatted datetimes.
> - Will intelligently limit display precision for datetime/timedelta formats.
> - Enhanced Panel apply() method.
> - Suggested tutorials in new Tutorials section.
> - Our pandas ecosystem is growing, We now feature related projects in a new 
> Pandas Ecosystem section.
> - Much work has been taking place on improving the docs, and a new 
> Contributing section has been added.
> 
> v0.13.1 Whatsnew Page
> http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#v0-13-1-february-3-2014
> 
> v0.13.1 Documentation Page
> http://pandas.pydata.org/pandas-docs/stable/
> 
> Pleas visit here for the source tarball:
> https://github.com/pydata/pandas/releases/tag/v0.13.1
> 
> Windows binaries are available from Christoph Gohlke's collection:
> http://www.lfd.uci.edu/~gohlke/pythonlibs/#pandas
> 
> tarballs and windows binaries are available on PyPi:
> https://pypi.python.org/pypi/pandas
> 
> We are looking forward to a next planned release of v0.14.0 in about three 
> months time.
> 
> Some things that we would like to include:
> 
> - A big upgrade to SQL to/from interop with support for all major DBs, 
> leveraging SQLAlchemy.
> - Template-based displays for dataframes, with conditional formatting and 
> roll-your-own output generation.
> - Reduced memory dataframe construction from known-length iterators.
> - Your PRs.
> 
> Thanks 
> 
> The Pandas Team
> 
> 
> Contributors to the 0.13.1 release
> 
> $ git log v0.12.1..v0.13.1 --pretty='%aN##%s' | grep -v 'Merge pull' | grep 
> -Po '^[^#]+' | sort | uniq -c | sort -rn 
> 
>   146 y-p
> 97 jreback
> 14 Joris Van den Bossche
>  8 Phillip Cloud
>  8 Andy Hayden
>  6 unutbu
>  4 Skipper Seabold
>  3 TomAugspurger
>  3 Jeff Tratner
>  3 DSM
>  3 Douglas McNeil
>  3 Dan Birken
>  3 Chapman Siu
>  2 Tom Augspurger
>  2 Naveen Michaud-Agrawal
>  2 Michael Schatzow
>  2 Kieran O'Mahony
>  2 Jacob Schaer
>  2 Doran Deluz
>  2 danielballan
>  2 Clark Fitzgerald
>  2 chapman siu
>  2 Caleb Epstein
>  2 Brad Buran
>  2 Andrew Burrows
>  2 Alex Rothberg
>  1 Spencer Lyon
>  1 Roman Pekar
>  1 Patrick O'Keeffe
>  1 mwaskom
>  1 lexual
>  1 Julia Evans
>  1 John McNamara
>  1 Jan Wagner
>  1 immerrr
>  1 Guillaume Gay
>  1 George Kuan
>  1 Felix Lawrence
>  1 Elliot S
>  1 Draen Luanin
>  1 Douglas Rudd
>  1 David Wolever
>  1 davidshinn
>  1 david
>  1 Daniel Waeber
>  1 Chase Albert
>  1 bwignall
>  1 bmu
>  1 Bjorn Arneson
>  1 Alok Singhal
>  1 akittredge
>  1 acorbe
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "PyData" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to pydata+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] 1.8.1 release

2014-02-24 Thread Jeff Reback

I am pretty sure that you guys test pandas master

but 1.8.1 looks good to me

> On Feb 24, 2014, at 4:42 PM, Charles R Harris  
> wrote:
> 
> 
> 
> 
>> On Mon, Feb 24, 2014 at 1:54 PM, RayS  wrote:
>> Has anyone alerted C Gohlke?
>> http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy
>> 
>> - Ray
> 
> Christolph seems to keep a pretty good eye on numpy and we rely on him to 
> test it on windows. In anycase, there are enough fixes backported, that I 
> think we better start with a 1.8.1rc.
> 
> Chuck 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Dates and times and Datetime64 (again)

2014-03-19 Thread Jeff Reback

Dave,

your example is not a problem with numpy per se, rather that the default
generation is in local timezone (same as what python datetime does).
If you localize to UTC you get the results that you expect.

In [49]: dates = pd.date_range('01-Apr-2014', '04-Apr-2014', freq='H')[:-1]

In [50]: pd.TimeSeries(values, dates.tz_localize('UTC')).groupby(lambda d:
d.date()).mean()
Out[50]:
2014-04-011
2014-04-022
2014-04-033
dtype: int64

In [51]: records = zip(map(str, dates.tz_localize('UTC')), values)

In [52]: df = pd.DataFrame(np.array(records, dtype=[('dates',
'M8[h]'),('values', float)]))

In [53]: df.set_index('dates').groupby(lambda x: x.date()).mean()
Out[53]:
values
2014-04-01   1
2014-04-02   2
2014-04-03   3

[3 rows x 1 columns]



On Wed, Mar 19, 2014 at 5:21 AM, Dave Hirschfeld  wrote:

> Sankarshan Mudkavi  uwaterloo.ca> writes:
>
> >
> > Hey all,
> > It's been a while since the last datetime and timezones discussion thread
> was visited (linked below):
> >
> > http://thread.gmane.org/gmane.comp.python.numeric.general/53805
> >
> > It looks like the best approach to follow is the UTC only approach in the
> linked thread with an optional flag to indicate the timezone (to avoid
> confusing applications where they don't expect any timezone info). Since
> this is slightly more useful than having just a naive datetime64 package
> and
> would be open to extension if required, it's probably the best way to start
> improving the datetime64 library.
> >
> 
> > I would like to start writing a NEP for this followed by implementation,
> however I'm not sure what the format etc. is, could someone direct me to a
> page where this information is provided?
> >
> > Please let me know if there are any ideas, comments etc.
> >
> > Cheers,
> > Sankarshan
> >
>
> See: http://article.gmane.org/gmane.comp.python.numeric.general/55191
>
>
> You could use a current NEP as a template:
> https://github.com/numpy/numpy/tree/master/doc/neps
>
>
> I'm a huge +100 on the simplest UTC fix.
>
> As is, using numpy datetimes is likely to silently give incorrect results -
> something I've already seen several times in end-user data analysis code.
>
> Concrete Example:
>
> In [16]: dates = pd.date_range('01-Apr-2014', '04-Apr-2014', freq='H')[:-1]
> ...: values = np.array([1,2,3]).repeat(24)
> ...: records = zip(map(str, dates), values)
> ...: pd.TimeSeries(values, dates).groupby(lambda d: d.date()).mean()
> ...:
> Out[16]:
> 2014-04-011
> 2014-04-022
> 2014-04-033
> dtype: int32
>
> In [17]: df = pd.DataFrame(np.array(records, dtype=[('dates', 'M8[h]'),
> ('values', float)]))
> ...: df.set_index('dates', inplace=True)
> ...: df.groupby(lambda d: d.date()).mean()
> ...:
> Out[17]:
>   values
> 2014-03-31  1.00
> 2014-04-01  1.041667
> 2014-04-02  2.041667
> 2014-04-03  3.00
>
> [4 rows x 1 columns]
>
> Try it in your timezone and see what you  get!
>
> -Dave
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Dates and times and Datetime64 (again)

2014-03-28 Thread Jeff Reback

FYI

Here are docs for panda of timezone handling

wesm worked thru the various issues w.r.t. conversion, localization, and
ambiguous zone crossing.

http://pandas.pydata.org/pandas-docs/stable/timeseries.html#time-zone-handling

implementation is largely in here:

 (underlying impl is a datetime64[ns] dtype with a pytz as the timezone)

https://github.com/pydata/pandas/blob/master/pandas/tseries/index.py



On Fri, Mar 28, 2014 at 4:30 PM, Sankarshan Mudkavi
wrote:

>
> Hi Nathaniel,
>
> 1- You give as an example of "naive" datetime handling:
>
> >>> np.datetime64('2005-02-25T03:00Z')
> np.datetime64('2005-02-25T03:00')
>
> This IIUC is incorrect. The Z modifier is a timezone offset, and for
> normal "naive" datetimes would cause an error.
>
>
> If what I understand from reading:
> http://thread.gmane.org/gmane.comp.python.numeric.general/53805
>
> It looks like anything other than Z, 00:00 or UTC that has a TZ adjustment
> would raise an error, and those specific conditions would not (I'm guessing
> this is because we assume it's UTC (or the same timezone) internally,
> anything that explicitly tells us it is UTC is acceptable, although that
> may be just my misreading of it.)
>
> However on output we don't use the Z modifier (which is why it's different
> from the UTC datetime64).
>
> I will change it to return an error if what I thought is incorrect and
> also include examples of conversion from datetimes as you requested.
>
> Please let me know if there are any more changes that are required! I look
> forward to further comments/questions.
>
> Cheers,
> Sankarshan
>
> On Fri, Mar 28, 2014 at 5:17 AM, Nathaniel Smith  wrote:
>
> On 28 Mar 2014 05:00, "Sankarshan Mudkavi"  wrote:
> >
> > Hi all,
> >
> > Apologies for the delay in following up, here is an expanded version of
> the proposal, which hopefully clears up most of the details. I have not
> included specific implementation details for the code, such as which
> functions to modify etc. since I think those are not traditionally included
> in NEPs?
>
> The format seems fine to me. Really the point is just to have a document
> that we can use as reference when deciding on behaviour, and this does that
> :-).
>
> Three quick comments:
>
> 1- You give as an example of "naive" datetime handling:
>
> >>> np.datetime64('2005-02-25T03:00Z')
> np.datetime64('2005-02-25T03:00')
>
> This IIUC is incorrect. The Z modifier is a timezone offset, and for
> normal "naive" datetimes would cause an error.
>
> 2- It would be good to include explicitly examples of conversion to and
> from datetimes alongside the examples of conversions to and from strings.
>
> 3- It would be good to (eventually) include some discussion of the impact
> of the preferred proposal on existing code. E.g., will this break a lot of
> people's pipelines? (Are people currently *always* adding timezones to
> their numpy input to avoid the problem, and now will have to switch to the
> opposite behaviour depending on numpy version?) And we'll want to make sure
> to get feedback from the pydata@ (pandas) list explicitly, though that
> can wait until people here have had a chance to respond to the first draft.
>
> Thanks for pushing this forward!
> -n
>
> Hi all,
>
> Apologies for the delay in following up, here is an expanded version of
> the proposal, which hopefully clears up most of the details. I have not
> included specific implementation details for the code, such as which
> functions to modify etc. since I think those are not traditionally included
> in NEPs?
>
> Please find attached the expanded proposal, and the rendered version is
> available here:
>
> https://github.com/Sankarshan-Mudkavi/numpy/blob/Enhance-datetime64/doc/neps/datetime-improvement-proposal.rst
>
> 
>
> I look forward to comments, agreements/disagreements with this (and
> clarification if this needs even further expansion).
>
>
> Please find attached the
> On Mar 24, 2014, at 12:39 AM, Chris Barker  wrote:
>
> On Fri, Mar 21, 2014 at 3:43 PM, Nathaniel Smith  wrote:
>
>> On Thu, Mar 20, 2014 at 11:27 PM, Chris Barker 
>> wrote:
>> > * I think there are more or less three options:
>> >1)  a) don't have any timezone handling at all -- all datetime64s
>> are UTC. Always
>> >  b) don't have any timezone handling at all -- all datetime64s
>> are naive
>> >  (the only difference between these two is I/O of strings,
>> and maybe I/O of datetime objects with a time zone)
>> > 2) Have a time zone associated with the array -- defaulting to
>> either UTC or None, but don't provide any implementation other than the
>> tagging, with the ability to add in TZ handler if you want (can this be
>> done efficiently?)
>> > 3) Full on proper TZ handling.
>> >
>> > I think (3) is off the table for now.
>>
>> I think the first goal is to define what a plain vanilla datetime64
>> does, without any extra attributes. This is for two practical reasons:
>> First, our overriding #1 goal is to fix the nasty I/O proble

[Numpy-discussion] ANN: Pandas 0.14.0 Release Candidate 1

2014-05-17 Thread Jeff Reback

Hi,

I'm pleased to announce the availability of the first release candidate of
Pandas 0.14.0.
Please try this RC and report any issues here: Pandas
Issues<https://github.com/pydata/pandas/issues>
We will be releasing officially in about 2 weeks or so.

This is a major release from 0.13.1 and includes a small number of API
changes, several new features, enhancements, and
performance improvements along with a large number of bug fixes.

Highlights include:

   - Officially support Python 3.4
   - SQL interfaces updated to use sqlalchemy,
   - Display interface changes
   - MultiIndexing Using Slicers
   - Ability to join a singly-indexed DataFrame with a multi-indexed
   DataFrame
   - More consistency in groupby results and more flexible groupby
   specifications
   - Holiday calendars are now supported in CustomBusinessDay
   - Several improvements in plotting functions, including: hexbin, area
   and pie plots.
   - Performance doc section on I/O operations

Since there are some significant changes in the default way DataFrames are
displayed. I have put
up a comment issue looking for some feedback
here<https://github.com/pydata/pandas/issues/7146>

Here are the full whatsnew and documentation links:

v0.14.0 Whatsnew<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html>

v0.14.0 Documentation Page<http://pandas-docs.github.io/pandas-docs-travis/>

Source tarballs, and windows builds are available here:

Pandas v0.14rc1 Release <https://github.com/pydata/pandas/releases>

A big thank you to everyone who contributed to this release!

Jeff
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] ANN: Pandas 0.14.0 released

2014-05-30 Thread Jeff Reback

Hello,

We are proud to announce v0.14.0 of pandas, a major release from 0.13.1.

This release includes a small number of API changes, several new features,
enhancements, and performance improvements along with a large number of bug
fixes.

This was 4 months of work with 1014 commits by 121 authors encompassing 757
issues.

We recommend that all users upgrade to this version.

*Highlights:*

   -   Officially support Python 3.4
   -   SQL interfaces updated to use sqlalchemy
   -   Display interface changes
   -   MultiIndexing Using Slicers
   -   Ability to join a singly-indexed DataFrame with a multi-indexed
   DataFrame
   -   More consistency in groupby results and more flexible groupby
   specifications
   -   Holiday calendars are now supported in CustomBusinessDay
   -   Several improvements in plotting functions, including: hexbin, area
   and pie plots
   -   Performance doc section on I/O operations

See a full description of Whatsnew for v0.14.0 here:
http://pandas.pydata.org/pandas-docs/stable/whatsnew.html


*What is it:*

*pandas* is a Python package providing fast, flexible, and expressive data
structures designed to make working with “relational” or “labeled” data both
easy and intuitive. It aims to be the fundamental high-level building block
for
doing practical, real world data analysis in Python. Additionally, it has
the
broader goal of becoming the most powerful and flexible open source data
analysis / manipulation tool available in any language.


Documentation:
http://pandas.pydata.org/pandas-docs/stable/

Source tarballs, windows binaries are available on PyPI:
https://pypi.python.org/pypi/pandas

windows binaries are courtesy of  Christoph Gohlke and are built on Numpy
1.8
macosx wheels will be available soon, courtesy of Matthew Brett

Please report any issues here:
https://github.com/pydata/pandas/issues


Thanks

The Pandas Development Team


Contributors to the 0.14.0 release

   - Acanthostega
   - Adam Marcus
   - agijsberts
   - akittredge
   - Alex Gaudio
   - Alex Rothberg
   - AllenDowney
   - Andrew Rosenfeld
   - Andy Hayden
   - ankostis
   - anomrake
   - Antoine Mazières
   - anton-d
   - bashtage
   - Benedikt Sauer
   - benjamin
   - Brad Buran
   - bwignall
   - cgohlke
   - chebee7i
   - Christopher Whelan
   - Clark Fitzgerald
   - clham
   - Dale Jung
   - Dan Allan
   - Dan Birken
   - danielballan
   - Daniel Waeber
   - David Jung
   - David Stephens
   - Douglas McNeil
   - DSM
   - Garrett Drapala
   - Gouthaman Balaraman
   - Guillaume Poulin
   - hshimizu77
   - hugo
   - immerrr
   - ischwabacher
   - Jacob Howard
   - Jacob Schaer
   - jaimefrio
   - Jason Sexauer
   - Jeff Reback
   - Jeffrey Starr
   - Jeff Tratner
   - John David Reaver
   - John McNamara
   - John W. O'Brien
   - Jonathan Chambers
   - Joris Van den Bossche
   - jreback
   - jsexauer
   - Julia Evans
   - Júlio
   - Katie Atkinson
   - kdiether
   - Kelsey Jordahl
   - Kevin Sheppard
   - K.-Michael Aye
   - Matthias Kuhn
   - Matt Wittmann
   - Max Grender-Jones
   - Michael E. Gruen
   - michaelws
   - mikebailey
   - Mike Kelly
   - Nipun Batra
   - Noah Spies
   - ojdo
   - onesandzeroes
   - Patrick O'Keeffe
   - phaebz
   - Phillip Cloud
   - Pietro Battiston
   - PKEuS
   - Randy Carnevale
   - ribonoous
   - Robert Gibboni
   - rockg
   - sinhrks
   - Skipper Seabold
   - SplashDance
   - Stephan Hoyer
   - Tim Cera
   - Tobias Brandt
   - Todd Jennings
   - TomAugspurger
   - Tom Augspurger
   - unutbu
   - westurner
   - Yaroslav Halchenko
   - y-p
   - zach powers
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] ANN: Pandas 0.14.0 released

2014-05-30 Thread Jeff Reback

the upgrade flag on pip is apparently recursive on all deps


On May 30, 2014, at 6:16 PM, Neal Becker  wrote:
> 
> pip install --user --up pandas
> Downloading/unpacking pandas from 
> https://pypi.python.org/packages/source/p/pandas/pandas-0.14.0.tar.gz#md5=b775987c0ceebcc8d5ace4a1241c967a
> ...
> 
> Downloading/unpacking numpy>=1.6.1 from 
> https://pypi.python.org/packages/source/n/numpy/numpy-1.8.1.tar.gz#md5=be95babe263bfa3428363d6db5b64678
>  
> (from pandas)
>  Downloading numpy-1.8.1.tar.gz (3.8MB): 3.8MB downloaded
>  Running setup.py egg_info for package numpy
>Running from numpy source directory.
> 
>warning: no files found matching 'tools/py3tool.py'
>warning: no files found matching '*' under directory 'doc/f2py'
>warning: no previously-included files matching '*.pyc' found anywhere in 
> distribution
>warning: no previously-included files matching '*.pyo' found anywhere in 
> distribution
>warning: no previously-included files matching '*.pyd' found anywhere in 
> distribution
> Downloading/unpacking six from 
> https://pypi.python.org/packages/source/s/six/six-1.6.1.tar.gz#md5=07d606ac08595d795bf926cc9985674f
>  
> (from python-dateutil->pandas)
>  Downloading six-1.6.1.tar.gz
>  Running setup.py egg_info for package six
> 
>no previously-included directories found matching 'documentation/_build'
> Installing collected packages: pandas, pytz, numpy, six
> 
> 
> What?  I already have numpy-1.8.0 installed (also have six, pytz).
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] ANN: Pandas 0.14.0 released

2014-05-31 Thread Jeff Reback

sure would take a pr for that

anything 2 make setup easier!

> On May 31, 2014, at 1:50 AM, Ralf Gommers  wrote:
> 
> 
> 
> 
>> On Sat, May 31, 2014 at 12:30 AM, Jeff Reback  wrote:
>> the upgrade flag on pip is apparently recursive on all deps
> 
> Indeed. This is super annoying, and trips up a lot of users. As long as that 
> doesn't change in pip, you should be using something like 
> https://github.com/scipy/scipy/pull/3566 in pandas I think. I'd be happy to 
> send a PR for that if you want.
> 
> Ralf
> 
>  
>> 
>> 
>> On May 30, 2014, at 6:16 PM, Neal Becker  wrote:
>> >
>> > pip install --user --up pandas
>> > Downloading/unpacking pandas from
>> > https://pypi.python.org/packages/source/p/pandas/pandas-0.14.0.tar.gz#md5=b775987c0ceebcc8d5ace4a1241c967a
>> > ...
>> >
>> > Downloading/unpacking numpy>=1.6.1 from
>> > https://pypi.python.org/packages/source/n/numpy/numpy-1.8.1.tar.gz#md5=be95babe263bfa3428363d6db5b64678
>> > (from pandas)
>> >  Downloading numpy-1.8.1.tar.gz (3.8MB): 3.8MB downloaded
>> >  Running setup.py egg_info for package numpy
>> >Running from numpy source directory.
>> >
>> >warning: no files found matching 'tools/py3tool.py'
>> >warning: no files found matching '*' under directory 'doc/f2py'
>> >warning: no previously-included files matching '*.pyc' found anywhere in
>> > distribution
>> >warning: no previously-included files matching '*.pyo' found anywhere in
>> > distribution
>> >warning: no previously-included files matching '*.pyd' found anywhere in
>> > distribution
>> > Downloading/unpacking six from
>> > https://pypi.python.org/packages/source/s/six/six-1.6.1.tar.gz#md5=07d606ac08595d795bf926cc9985674f
>> > (from python-dateutil->pandas)
>> >  Downloading six-1.6.1.tar.gz
>> >  Running setup.py egg_info for package six
>> >
>> >no previously-included directories found matching 'documentation/_build'
>> > Installing collected packages: pandas, pytz, numpy, six
>> > 
>> >
>> > What?  I already have numpy-1.8.0 installed (also have six, pytz).
>> >
>> > ___
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion@scipy.org
>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] ANN: NumPy 1.9.0 beta release

2014-06-09 Thread Jeff Reback

The one pandas test failure that is valid: ERROR:
test_interp_regression (pandas.tests.test_generic.TestSeries)

has been fixed in pandas master / 0.14.1 (prob releasing in 1 month).

(the other test failures are for clipboard / network issues)




On Mon, Jun 9, 2014 at 7:21 PM, Christoph Gohlke  wrote:

> On 6/8/2014 1:34 PM, Julian Taylor wrote:
>
>> Hello,
>>
>> I'm happy to announce the fist beta release of Numpy 1.9.0.
>> 1.9.0 will be a new feature release supporting Python 2.6 - 2.7 and 3.2
>> - 3.4.
>> Due to low demand windows binaries for the beta are only available for
>> Python 2.7, 3.3 and 3.4.
>> Please try it and report any issues to the numpy-discussion mailing list
>> or on github.
>>
>> The 1.9 release will consists of mainly of many small improvements and
>> bugfixes. The highlights are:
>>
>> * Addition of __numpy_ufunc__ to allow overriding ufuncs in ndarray
>> subclasses. Please note that there are still some known issues with this
>> mechanism which we hope to resolve before the final release (e.g. #4753)
>> * Numerous performance improvements in various areas, most notably
>> indexing and operations on small arrays are significantly faster.
>> Indexing operations now also release the GIL.
>> * Addition of nanmedian and nanpercentile rounds out the nanfunction set.
>>
>> The changes involve a lot of small changes that might affect some
>> applications, please read the release notes for the full details on all
>> changes:
>> https://github.com/numpy/numpy/blob/maintenance/1.9.x/
>> doc/release/1.9.0-notes.rst
>> Please also take special note of the future changes section which will
>> apply to the following release 1.10.0 and make sure to check if your
>> applications would be affected by them.
>>
>> Source tarballs, windows installers and release notes can be found at
>> https://sourceforge.net/projects/numpy/files/NumPy/1.9.0b1
>>
>> Cheers,
>> Julian Taylor
>>
>>
> Hello,
>
> I tested numpy-MKL-1.9.0b1 (msvc9, Intel MKL build) on win-amd64-py2.7
> against a few other packages that were built against numpy-MKL-1.8.x.
>
> While numpy and scipy pass all tests, some other packages (matplotlib,
> statsmodels, skimage, pandas, pytables, sklearn...) show a few new test
> failures (compared to testing with numpy-MKL-1.8.1). Many test errors are
> of kind:
>
> ValueError: shape mismatch: value array of shape (24,) could not be
> broadcast to indexing result of shape (8,3)
>
> I have attached a list of failing tests. The full test results are at <
> http://www.lfd.uci.edu/~gohlke/pythonlibs/tests/20140609-win-amd64-py2.7-
> numpy-1.9.0b1/> (compare to  gohlke/pythonlibs/tests/20140609-win-amd64-py2.7/>)
>
> I have not investigated any further...
>
> Christoph
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] genfromtxt universal newline support

2014-06-30 Thread Jeff Reback

In pandas 0.14.0, generic whitespace IS parsed via the c-parser, e.g. 
specifying '\s+' as a separator. Not sure when you were playing last with 
pandas, but the c-parser has been in place since late 2012. (version 0.8.0)

http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#text-parsing-api-changes
> On Jun 30, 2014, at 4:58 PM, Derek Homeier 
>  wrote:
> 
> On 30 Jun 2014, at 04:56 pm, Nathaniel Smith  wrote:
> 
>>> A real need, which had also been discussed at length, is a truly performant 
>>> text IO
>>> function (i.e. one using a compiled ASCII number parser, and optimally also 
>>> a more
>>> memory-efficient one), but unfortunately all people interested in 
>>> implementing this
>>> seem to have drifted away (not excluding myself from this)…
>> 
>> It's possible we could steal some code from Pandas for this. IIRC they
>> have C/Cython text parsing routines. (It's also an interesting
>> question whether they've fixed the unicode/binary issues, might be
>> worth checking before rewriting from scratch...)
> 
> Good point, last time I was playing with Pandas it was not any faster, but 
> now a 10x
> speedup speaks for itself. Their C engine does not support generic whitespace 
> separators,
> but that could probably be addressed in a numpy implementation.
> 
>Derek
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Questions about fixes for 1.9.0rc2

2014-07-04 Thread Jeff Reback

ok from pandas
we test with numpy master on Travis (which does pick up things!)

thanks

> On Jul 4, 2014, at 7:07 PM, Charles R Harris  
> wrote:
> 
> 
> 
> 
>> On Fri, Jul 4, 2014 at 3:33 PM, Nathaniel Smith  wrote:
>> On Fri, Jul 4, 2014 at 10:31 PM, Charles R Harris
>>  wrote:
>> >
>> > On Fri, Jul 4, 2014 at 3:15 PM, Nathaniel Smith  wrote:
>> >>
>> >> On Fri, Jul 4, 2014 at 9:48 PM, Charles R Harris
>> >>  wrote:
>> >> >
>> >> > On Fri, Jul 4, 2014 at 2:41 PM, Nathaniel Smith  wrote:
>> >> >>
>> >> >> On Fri, Jul 4, 2014 at 9:33 PM, Charles R Harris
>> >> >>  wrote:
>> >> >> >
>> >> >> > On Fri, Jul 4, 2014 at 2:09 PM, Nathaniel Smith 
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> On Fri, Jul 4, 2014 at 9:02 PM, Ralf Gommers
>> >> >> >> 
>> >> >> >> wrote:
>> >> >> >> >
>> >> >> >> > On Fri, Jul 4, 2014 at 10:00 PM, Charles R Harris
>> >> >> >> >  wrote:
>> >> >> >> >>
>> >> >> >> >> On Fri, Jul 4, 2014 at 1:42 PM, Charles R Harris
>> >> >> >> >>  wrote:
>> >> >> >> >>>
>> >> >> >> >>> Sebastian Seberg has fixed one class of test failures due to the
>> >> >> >> >>> indexing
>> >> >> >> >>> changes in numpy 1.9.0b1.  There are some remaining errors, and
>> >> >> >> >>> in
>> >> >> >> >>> the
>> >> >> >> >>> case
>> >> >> >> >>> of the Matplotlib failures, they look to me to be Matplotlib
>> >> >> >> >>> bugs.
>> >> >> >> >>> The
>> >> >> >> >>> 2-d
>> >> >> >> >>> arrays that cause the error are returned by the overloaded
>> >> >> >> >>> _interpolate_single_key function in CubicTriInterpolator that is
>> >> >> >> >>> documented
>> >> >> >> >>> in the base class to return a 1-d array, whereas the actual
>> >> >> >> >>> dimensions
>> >> >> >> >>> are
>> >> >> >> >>> of the form (n, 1). The question is, what is the best work
>> >> >> >> >>> around
>> >> >> >> >>> here
>> >> >> >> >>> for
>> >> >> >> >>> these sorts errors? Can we afford to break Matplotlib and other
>> >> >> >> >>> packages on
>> >> >> >> >>> account of a bug that was previously accepted by Numpy?
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > It depends how bad the break is, but in principle I'd say that
>> >> >> >> > breaking
>> >> >> >> > Matplotlib is not OK.
>> >> >> >>
>> >> >> >> I agree. If it's easy to hack around it and issue a warning for now,
>> >> >> >> and doesn't have other negative consequences, then IMO we should
>> >> >> >> give
>> >> >> >> matplotlib a release or so worth of grace period to fix things.
>> >> >> >
>> >> >> >
>> >> >> > Here is another example, from skimage.
>> >> >> >
>> >> >> >
>> >> >> > ==
>> >> >> > ERROR: test_join.test_relabel_sequential_offset1
>> >> >> >
>> >> >> > --
>> >> >> > Traceback (most recent call last):
>> >> >> >   File "X:\Python27-x64\lib\site-packages\nose\case.py", line 197, in
>> >> >> > runTest
>> >> >> > self.test(*self.arg)
>> >> >> >   File
>> >> >> >
>> >> >> >
>> >> >> > "X:\Python27-x64\lib\site-packages\skimage\segmentation\tests\test_join.py",
>> >> >> > line 30, in test_relabel_sequential_offset1
>> >> >> > ar_relab, fw, inv = relabel_sequential(ar)
>> >> >> >   File
>> >> >> > "X:\Python27-x64\lib\site-packages\skimage\segmentation\_join.py",
>> >> >> > line 127, in relabel_sequential
>> >> >> > forward_map[labels0] = np.arange(offset, offset + len(labels0) +
>> >> >> > 1)
>> >> >> > ValueError: shape mismatch: value array of shape (6,) could not be
>> >> >> > broadcast
>> >> >> > to indexing result of shape (5,)
>> >> >> >
>> >> >> > Which is pretty clearly a coding error. Unfortunately, the error is
>> >> >> > in
>> >> >> > the
>> >> >> > package rather than the test.
>> >> >> >
>> >> >> > The only easy way to fix all of these sorts of things is to revert
>> >> >> > the
>> >> >> > indexing changes, and I'm loathe to do that. Grrr...
>> >> >>
>> >> >> Ugh, that's pretty bad :-/. Do you really think we can't use a
>> >> >> band-aid over the new indexing code, though?
>> >> >
>> >> >
>> >> > Yeah, we can. But Sebastian doesn't have time and I'm unfamiliar with
>> >> > the
>> >> > code, so it may take a while...
>> >>
>> >> Fair enough!
>> >>
>> >> I guess that if what are (arguably) bugs in matplotlib and
>> >> scikit-image are holding up the numpy release, then it's worth CC'ing
>> >> their mailing lists in case someone feels like volunteering to fix
>> >> it... ;-).
>> >
>> > I can do that ;) Doesn't help with the release though unless we want to
>> > document the errors in the release notes and tell folks to wait on the next
>> > release of the packages.
>> 
>> Oh, I meant, in case they want to fix numpy so that their packages
>> don't break :-).
> 
> I've filed issues with all the affected projects. Here is the current status.
> 
> matplotlib -- Reported, being fixed, should be in 1.4 in a few days.
> skimage -- Reported.
> scikit-learn -- Reported.
> tables -- Reported.
> statsmodels -- Reported, fixed in master.
> bottleneck --

Re: [Numpy-discussion] Questions about fixes for 1.9.0rc2

2014-07-04 Thread Jeff Reback

pandas 0.14.1 scheduled for end of next week (was waiting to see schedule for 
numpy 1.9) but works either way

> On Jul 4, 2014, at 7:41 PM, Nathaniel Smith  wrote:
> 
> On 5 Jul 2014 00:07, "Charles R Harris"  wrote:
> >
> >
> >
> >
> > On Fri, Jul 4, 2014 at 3:33 PM, Nathaniel Smith  wrote:
> >>
> >> On Fri, Jul 4, 2014 at 10:31 PM, Charles R Harris
> >>  wrote:
> >> >
> >> > On Fri, Jul 4, 2014 at 3:15 PM, Nathaniel Smith  wrote:
> >> >>
> >> >> On Fri, Jul 4, 2014 at 9:48 PM, Charles R Harris
> >> >>  wrote:
> >> >> >
> >> >> > On Fri, Jul 4, 2014 at 2:41 PM, Nathaniel Smith  
> >> >> > wrote:
> >> >> >>
> >> >> >> On Fri, Jul 4, 2014 at 9:33 PM, Charles R Harris
> >> >> >>  wrote:
> >> >> >> >
> >> >> >> > On Fri, Jul 4, 2014 at 2:09 PM, Nathaniel Smith 
> >> >> >> > wrote:
> >> >> >> >>
> >> >> >> >> On Fri, Jul 4, 2014 at 9:02 PM, Ralf Gommers
> >> >> >> >> 
> >> >> >> >> wrote:
> >> >> >> >> >
> >> >> >> >> > On Fri, Jul 4, 2014 at 10:00 PM, Charles R Harris
> >> >> >> >> >  wrote:
> >> >> >> >> >>
> >> >> >> >> >> On Fri, Jul 4, 2014 at 1:42 PM, Charles R Harris
> >> >> >> >> >>  wrote:
> >> >> >> >> >>>
> >> >> >> >> >>> Sebastian Seberg has fixed one class of test failures due to 
> >> >> >> >> >>> the
> >> >> >> >> >>> indexing
> >> >> >> >> >>> changes in numpy 1.9.0b1.  There are some remaining errors, 
> >> >> >> >> >>> and
> >> >> >> >> >>> in
> >> >> >> >> >>> the
> >> >> >> >> >>> case
> >> >> >> >> >>> of the Matplotlib failures, they look to me to be Matplotlib
> >> >> >> >> >>> bugs.
> >> >> >> >> >>> The
> >> >> >> >> >>> 2-d
> >> >> >> >> >>> arrays that cause the error are returned by the overloaded
> >> >> >> >> >>> _interpolate_single_key function in CubicTriInterpolator that 
> >> >> >> >> >>> is
> >> >> >> >> >>> documented
> >> >> >> >> >>> in the base class to return a 1-d array, whereas the actual
> >> >> >> >> >>> dimensions
> >> >> >> >> >>> are
> >> >> >> >> >>> of the form (n, 1). The question is, what is the best work
> >> >> >> >> >>> around
> >> >> >> >> >>> here
> >> >> >> >> >>> for
> >> >> >> >> >>> these sorts errors? Can we afford to break Matplotlib and 
> >> >> >> >> >>> other
> >> >> >> >> >>> packages on
> >> >> >> >> >>> account of a bug that was previously accepted by Numpy?
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> > It depends how bad the break is, but in principle I'd say that
> >> >> >> >> > breaking
> >> >> >> >> > Matplotlib is not OK.
> >> >> >> >>
> >> >> >> >> I agree. If it's easy to hack around it and issue a warning for 
> >> >> >> >> now,
> >> >> >> >> and doesn't have other negative consequences, then IMO we should
> >> >> >> >> give
> >> >> >> >> matplotlib a release or so worth of grace period to fix things.
> >> >> >> >
> >> >> >> >
> >> >> >> > Here is another example, from skimage.
> >> >> >> >
> >> >> >> >
> >> >> >> > ==
> >> >> >> > ERROR: test_join.test_relabel_sequential_offset1
> >> >> >> >
> >> >> >> > --
> >> >> >> > Traceback (most recent call last):
> >> >> >> >   File "X:\Python27-x64\lib\site-packages\nose\case.py", line 197, 
> >> >> >> > in
> >> >> >> > runTest
> >> >> >> > self.test(*self.arg)
> >> >> >> >   File
> >> >> >> >
> >> >> >> >
> >> >> >> > "X:\Python27-x64\lib\site-packages\skimage\segmentation\tests\test_join.py",
> >> >> >> > line 30, in test_relabel_sequential_offset1
> >> >> >> > ar_relab, fw, inv = relabel_sequential(ar)
> >> >> >> >   File
> >> >> >> > "X:\Python27-x64\lib\site-packages\skimage\segmentation\_join.py",
> >> >> >> > line 127, in relabel_sequential
> >> >> >> > forward_map[labels0] = np.arange(offset, offset + len(labels0) 
> >> >> >> > +
> >> >> >> > 1)
> >> >> >> > ValueError: shape mismatch: value array of shape (6,) could not be
> >> >> >> > broadcast
> >> >> >> > to indexing result of shape (5,)
> >> >> >> >
> >> >> >> > Which is pretty clearly a coding error. Unfortunately, the error is
> >> >> >> > in
> >> >> >> > the
> >> >> >> > package rather than the test.
> >> >> >> >
> >> >> >> > The only easy way to fix all of these sorts of things is to revert
> >> >> >> > the
> >> >> >> > indexing changes, and I'm loathe to do that. Grrr...
> >> >> >>
> >> >> >> Ugh, that's pretty bad :-/. Do you really think we can't use a
> >> >> >> band-aid over the new indexing code, though?
> >> >> >
> >> >> >
> >> >> > Yeah, we can. But Sebastian doesn't have time and I'm unfamiliar with
> >> >> > the
> >> >> > code, so it may take a while...
> >> >>
> >> >> Fair enough!
> >> >>
> >> >> I guess that if what are (arguably) bugs in matplotlib and
> >> >> scikit-image are holding up the numpy release, then it's worth CC'ing
> >> >> their mailing lists in case someone feels like volunteering to fix
> >> >> it... ;-).
> >> >
> >> > I can do that ;) Doesn't help with the release though unless we want to
> >> > document the errors in the release notes and tell f

[Numpy-discussion] ANN: pandas 0.14.1 released

2014-07-12 Thread Jeff Reback

Hello,

We are proud to announce v0.14.1 of pandas, a minor release from 0.14.0.

This release includes a small number of API changes, several new features,
enhancements, and performance improvements along with a large number of bug
fixes.

This was 1.5 months of work with 244 commits by 45 authors encompassing 306
issues.

We recommend that all users upgrade to this version.

*Highlights:*


   - New method select_dtypes()
   

   to select columns based on the dtype
   - New method sem()
   

   to calculate the standard error of the mean.
   - Support for dateutil timezones (see *docs*
   

   ).
   - Support for ignoring full line comments in the read_csv()
   
text
   parser.
   - New documentation section on *Options and Settings*
   .
   - Lots of bug fixes


For a more a full description of Whatsnew for v0.14.1 here:
http://pandas.pydata.org/pandas-docs/stable/whatsnew.html


*What is it:*

*pandas* is a Python package providing fast, flexible, and expressive data
structures designed to make working with “relational” or “labeled” data both
easy and intuitive. It aims to be the fundamental high-level building block
for
doing practical, real world data analysis in Python. Additionally, it has
the
broader goal of becoming the most powerful and flexible open source data
analysis / manipulation tool available in any language.


Documentation:
http://pandas.pydata.org/pandas-docs/stable/

Source tarballs, windows binaries are available on PyPI:
https://pypi.python.org/pypi/pandas

windows binaries are courtesy of  Christoph Gohlke and are built on Numpy
1.8
macosx wheels will be available soon, courtesy of Matthew Brett

Please report any issues here:
https://github.com/pydata/pandas/issues


Thanks

The Pandas Development Team


Contributors to the 0.14.1 release

   - Andrew Rosenfeld
   - Andy Hayden
   - Benjamin Adams
   - Benjamin M. Gross
   - Brian Quistorff
   - Brian Wignall
   - bwignall
   - clham
   - Daniel Waeber
   - David Bew
   - David Stephens
   - DSM
   - dsm054
   - helger
   - immerrr
   - Jacob Schaer
   - jaimefrio
   - Jan Schulz
   - John David Reaver
   - John W. O’Brien
   - Joris Van den Bossche
   - jreback
   - Julien Danjou
   - Kevin Sheppard
   - K.-Michael Aye
   - Kyle Meyer
   - lexual
   - Matthew Brett
   - Matt Wittmann
   - Michael Mueller
   - Mortada Mehyar
   - onesandzeroes
   - Phillip Cloud
   - Rob Levy
   - rockg
   - sanguineturtle
   - Schaer, Jacob C
   - seth-p
   - sinhrks
   - Stephan Hoyer
   - Thomas Kluyver
   - Todd Jennings
   - TomAugspurger
   - unknown
   - yelite
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] ANN: Pandas 0.14.0 Release Candidate 1

2014-07-12 Thread Jeff Reback

Ray

Matthew builds Mac osx wheels for scipy stack (those are windows binaries)

thanks anyhow


> On Jul 11, 2014, at 12:10 PM, RayS  wrote:
> 
> At 04:56 AM 7/11/2014, you wrote:
>> Matthew, we posted the release of 0.14.1 last night. Are these 
>> picked up and build here automatically? 
>> https://nipy.bic.berkeley.edu/scipy_installers/
> 
> I see it's at http://www.lfd.uci.edu/~gohlke/pythonlibs/#pandas
> 
> - Ray 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] String type again.

2014-07-16 Thread Jeff Reback

in 0.15.0 pandas will have full fledged support for categoricals which in 
effect allow u 2 map a smaller number of strings to integers 

this is now in pandas master 

http://pandas-docs.github.io/pandas-docs-travis/categorical.html

feedback welcome!

> On Jul 14, 2014, at 1:00 PM, Olivier Grisel  wrote:
> 
> 2014-07-13 19:05 GMT+02:00 Alexander Belopolsky :
>> 
>>> On Sat, Jul 12, 2014 at 8:02 PM, Nathaniel Smith  wrote:
>>> 
>>> I feel like for most purposes, what we *really* want is a variable length
>>> string dtype (I.e., where each element can be a different length.).
>> 
>> 
>> 
>> I've been toying with the idea of creating an array type for interned
>> strings.  In many applications dealing with large arrays of variable size
>> strings, the strings come from a relatively short set of names.  Arrays of
>> interned strings can be manipulated very efficiently because in may respects
>> they are just like arrays of integers.
> 
> +1 I think this is why pandas is using dtype=object to load string
> data: in many cases short string values are used to represent
> categorical variables with a comparatively small cardinality of
> possible values for a dataset with comparatively numerous records.
> 
> In that case the dtype=object is not that bad as it just stores
> pointer on string objects managed by Python. It's possible to intern
> the strings manually at load time (I don't know if pandas or python
> already do it automatically in that case). The integer semantics is
> good for that case. Having an explicit dtype might be even better.
> 
> -- 
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy.mean still broken for large float32 arrays

2014-07-24 Thread Jeff Reback

related recent issue: https://github.com/numpy/numpy/issues/4638
and pandas is now explicitly specifying the accumulator to avoid this
problem: https://github.com/pydata/pandas/pull/6954/files

pandas also implemented the Welfords method for rolling_var in 0.14.0, see
here: https://github.com/pydata/pandas/pull/6817


On Thu, Jul 24, 2014 at 3:05 PM, RayS  wrote:

> Probably a number of scipy places as well
>
>
>
> import numpy
> import scipy.stats
> print numpy.__version__
> print scipy.__version__
> for s in range(16777214, 16777944):
>  if scipy.stats.nanmean(numpy.ones((s, 1), numpy.float32))[0]!=1:
>  print '\nbroke', s, scipy.stats.nanmean(numpy.ones((s, 1),
> numpy.float32))
>  break
>  else:
>  print '\r',s,
>
> c:\temp>python np_sum.py
> 1.8.0b2
> 0.11.0
> 16777216
> broke 16777217 [ 0.9994]
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Does a `mergesorted` function make sense?

2014-09-04 Thread Jeff Reback

FYI pandas DOES use a very performant hash table impl for unique (and
value_counts). Sorted state IS maintained
by underlying Index implmentation.
https://github.com/pydata/pandas/blob/master/pandas/hashtable.pyx

In [8]: a = np.random.randint(10, size=(1,))

In [9]: %timeit np.unique(a)
1000 loops, best of 3: 284 µs per loop

In [10]: %timeit Series(a).unique()
1 loops, best of 3: 161 µs per loop

In [11]: s = Series(a)

# without the creation overhead
In [12]: %timeit s.unique()
1 loops, best of 3: 75.3 µs per loop



On Thu, Sep 4, 2014 at 2:29 PM, Eelco Hoogendoorn <
hoogendoorn.ee...@gmail.com> wrote:

>
> On Thu, Sep 4, 2014 at 8:14 PM, Eelco Hoogendoorn <
> hoogendoorn.ee...@gmail.com> wrote:
>
>> I should clarify: I am speaking about my implementation, I havnt looked
>> at the numpy implementation for a while so im not sure what it is up to.
>> Note that by 'almost free', we are still talking about three passes over
>> the whole array plus temp allocations, but I am assuming a use-case where
>> the various sorts involved are the dominant cost, which I imagine they are,
>> for out-of-cache sorts. Perhaps this isn't too realistic an assumption
>> about the average use case though, I don't know. Though I suppose its a
>> reasonable guideline to assume that either the dataset is big, or
>> performance isn't that big a concern in the first place.
>>
>>
>> On Thu, Sep 4, 2014 at 7:55 PM, Jaime Fernández del Río <
>> jaime.f...@gmail.com> wrote:
>>
>>> On Thu, Sep 4, 2014 at 10:39 AM, Eelco Hoogendoorn <
>>> hoogendoorn.ee...@gmail.com> wrote:
>>>

 On Thu, Sep 4, 2014 at 10:31 AM, Eelco Hoogendoorn <
 hoogendoorn.ee...@gmail.com> wrote:

>
> On Wed, Sep 3, 2014 at 6:46 PM, Jaime Fernández del Río <
> jaime.f...@gmail.com> wrote:
>
>>  On Wed, Sep 3, 2014 at 9:33 AM, Jaime Fernández del Río <
>> jaime.f...@gmail.com> wrote:
>>
>>> On Wed, Sep 3, 2014 at 6:41 AM, Eelco Hoogendoorn <
>>> hoogendoorn.ee...@gmail.com> wrote:
>>>
  Not sure about the hashing. Indeed one can also build an index of
 a set by means of a hash table, but its questionable if this leads to
 improved performance over performing an argsort. Hashing may have 
 better
 asymptotic time complexity in theory, but many datasets used in 
 practice
 are very easy to sort (O(N)-ish), and the time-constant of hashing is
 higher. But more importantly, using a hash-table guarantees poor cache
 behavior for many operations using this index. By contrast, sorting may
 (but need not) make one random access pass to build the index, and may 
 (but
 need not) perform one random access to reorder values for grouping. But
 insofar as the keys are better behaved than pure random, this coherence
 will be exploited.

>>>
>>> If you want to give it a try, these branch of my numpy fork has hash
>>> table based implementations of unique (with no extra indices) and in1d:
>>>
>>>  https://github.com/jaimefrio/numpy/tree/hash-unique
>>>
>>> A use cases where the hash table is clearly better:
>>>
>>> In [1]: import numpy as np
>>> In [2]: from numpy.lib._compiled_base import _unique, _in1d
>>>
>>> In [3]: a = np.random.randint(10, size=(1,))
>>> In [4]: %timeit np.unique(a)
>>> 1000 loops, best of 3: 258 us per loop
>>> In [5]: %timeit _unique(a)
>>> 1 loops, best of 3: 143 us per loop
>>> In [6]: %timeit np.sort(_unique(a))
>>> 1 loops, best of 3: 149 us per loop
>>>
>>> It typically performs between 1.5x and 4x faster than sorting. I
>>> haven't profiled it properly to know, but there may be quite a bit of
>>> performance to dig out: have type specific comparison functions, 
>>> optimize
>>> the starting hash table size based on the size of the array to avoid
>>> reinsertions...
>>>
>>> If getting the elements sorted is a necessity, and the array
>>> contains very few or no repeated items, then the hash table approach may
>>> even perform worse,:
>>>
>>> In [8]: a = np.random.randint(1, size=(5000,))
>>> In [9]: %timeit np.unique(a)
>>> 1000 loops, best of 3: 277 us per loop
>>> In [10]: %timeit np.sort(_unique(a))
>>> 1000 loops, best of 3: 320 us per loop
>>>
>>> But the hash table still wins in extracting the unique items only:
>>>
>>> In [11]: %timeit _unique(a)
>>> 1 loops, best of 3: 187 us per loop
>>>
>>> Where the hash table shines is in more elaborate situations. If you
>>> keep the first index where it was found, and the number of repeats, in 
>>> the
>>> hash table, you can get return_index and return_counts almost for free,
>>> which means you are performing an extra 3x faster than with sorting.
>>> return_inverse requires a little more trickery, so I won;t attem

Re: [Numpy-discussion] Custom dtypes without C -- or, a standard ndarray-like type

2014-09-22 Thread Jeff Reback

Hopefully this is not TL;DR!

Their are 3 'dtype' likes that exist in pandas that could in theory mostly
be migrated back to numpy. These currently exist as the .values in-other-words
the object to which pandas defers data storage and computation for
some/most of operations.

1) SparseArray: This is the basis for SparseSeries. It is ndarray-like (its
actually a ndarray-sub-class) and optimized for the 1-d case. My guess is
that @wesm <https://github.com/wesm> created this because it a) didn't
exist in numpy, and b) didn't want scipy as an explicity dependency (at the
time), late 2011.

2) datetime support: This is not a target dtype per se, but really a
reimplementation over the top of datetime64[ns], with the associated scalar
Timestamp which is a proper sub-class of datetime.datetime. I believe @wesm
<https://github.com/wesm> created this because numpy datetime support was
(and still is to some extent) just completely broken (though better in
1.7+). It doesn't support proper timezones, the display is always in the
local timezone., and the scalar type (np.datetime64) is not extensible at
all (e.g. so have not easy to have custom printing, or parsing). These are
all well known by the numpy community and have seen some recent proposals
to remedy.

3) pd.Categorical: This was another class wesm wrote several years ago. It
is actually *could* be a numpy sub-class, though its a bit awkward as its
really a numpy-like sub-class that contains 2 ndarray-like arrays, and is
more appropriately implemented as a container of multiple-ndarrays.

So when we added support for Categoricals recently, why didn't we say try
to push a categorical dtype? I think their are several reasons, in no
particular order:

   -

   pd.Categorical is really a container of multiple ndarrays, and is
   ndarray-like. Further its API is somewhat constrained. It was simpler to
   make a python container class rather than try to sub-class ndarray and
   basically override / throw out many methods (as a lot of computation
   methods simply don't make sense between 2 categoricals). You can make a
   case that this *should not * be in numpy for this reason.
   -

   The changes in pandas for the 3 cases outlined above, were mostly on how
   to integrate these with the top-level containers (Series/DataFrame), rather
   than actually writing / re-writing a new dtype for a ndarray class. We
   always try to reuse, so we just try to extend the ndarray-like rather than
   create a new one from scratch.
   -

   Getting for example a Categorical dtype into numpy prob would take a
   pretty long cycle time. I think you need a champion for new features to
   really push them. It hasn't happened with datetime and that's been a while
   (of course its possible that pandas diverted some of this need)
   -

   API design: I think this is a big issue actually. When I added
   Categorical container support, I didn't want to change the API of
   Categorical much (and it pretty much worked out that way, mainly adding
   to it). So, say we took the path of assuming that numpy would have a nice
   categorical data dtype. We would almost certainly have to wrap it in
   something to provided needed functionaility that would necessarily be
   missing in an initial version. (of course eventually that may not be
   necessary).
   -

   So the 'nobody wants to write in C' argument is true for datetimes, but
   not for SparseArray/Categorical. In fact much of that code is just
   calling out to numpy (though some cython code too).
   -

   from a performance perspective, numpy needs a really good hashtable in
   order to support proper factorizing, which @wesm
   <https://github.com/wesm> co-opted klib to do (see this thread here
   <https://www.mail-archive.com/numpy-discussion@scipy.org/msg46024.html> for
   a discussion on this).

So I know I am repeating myself, but it comes down to this. The
API/interface of the delegated methods needs to be defined. For ndarrays it
is long established and well-known. So easy to gear pandas to that. However
with a *newer* type that is not the case, so pandas can easily decide, hey
this is the most correct behavior, let's do it this way, nothing to break,
no back compat needed.

Jeff

On Sun, Sep 21, 2014 at 11:31 PM, Nathaniel Smith  wrote:

> On Sun, Sep 21, 2014 at 7:50 PM, Stephan Hoyer  wrote:
> > pandas has some hacks to support custom types of data for which numpy
> can't
> > handle well enough or at all. Examples include datetime and Categorical
> [1],
> > and others like GeoArray [2] that haven't make it into pandas yet.
> >
> > Most of these look like numpy arrays but with custom dtypes and type
> > specific methods/properties. But clearly nobody is particularly excited
> > about writing the the C necessary to implement custom dtypes [3]. Nor is
> do
> > we need t

[Numpy-discussion] Dataframe memory info printing

2014-09-23 Thread Jeff Reback

For the 0.15.0 release of pandas (coming 2nd week of oct), we are going to
include memory info printing:

see here: https://github.com/pydata/pandas/pull/7619

This will be controllable by an option display.memory_usage.

My question to the community should this be by default True, e.g. show the
memory usage (this only applies to using df.info())
There is really no performance impact here.

Pls let us know!

thanks

Jeff

>>> df.info(memory_usage=True)Int64Index: 
>>> 1000 entries, 0 to 999Data columns (total 5 columns):date
>>> datetime64[ns]float   float64int int64smallintint16string   
>>>objectdtypes: datetime64[ns](1), float64(1), int16(1), int64(1), 
>>> object(1)memory usage: 324.2 MB
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] ANN: Pandas 0.15.0 Release Candiate 1

2014-10-07 Thread Jeff Reback

Hi,

I'm pleased to announce the availability of the first release candidate of
Pandas 0.15.0.
Please try this RC and report any issues here: Pandas Issues
<https://github.com/pydata/pandas/issues>
We will be releasing officially in 1-2 weeks or so.

This is a major release from 0.14.1 and includes a number of API changes,
several new features, enhancements, and performance improvements along with
a large number of bug fixes.

Highlights include:

- Drop support for numpy < 1.7.0
- The Categorical type was integrated as a first-class pandas type
- New scalar type Timedelta, and a new index type TimedeltaIndex
- New DataFrame default display for df.info() to include memory usage
- New datetimelike properties accessor .dt for Series
- Split indexing documentation into Indexing and Selecting Data and
MultiIndex / Advanced Indexing
- Split out string methods documentation into Working with Text Data
- read_csv will now by default ignore blank lines when parsing
- API change in using Indexes in set operations
- Internal refactoring of the Index class to no longer sub-class ndarray
- dropping support for PyTables less than version 3.0.0, and numexpr less
than version 2.1

Here are the full whatsnew and documentation links:
v0.15.0 Whatsnew
<http://pandas.pydata.org/pandas-docs/version/0.15.0/whatsnew.html>

v0.15.0 Documentation Page
<http://pandas.pydata.org/pandas-docs/version/0.15.0/>

Source tarballs, and windows builds are available here:

Pandas v0.15.0rc1 Release <https://github.com/pydata/pandas/releases>

A big thank you to everyone who contributed to this release!

Jeff
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] ANN: Pandas 0.15.0 released

2014-10-19 Thread Jeff Reback

Hello,

We are proud to announce v0.15.0 of pandas, a major release from 0.14.1.

This release includes a small number of API changes, several new features,
enhancements, and performance improvements along with a large number of bug
fixes.

This was 4 months of work with 420 commits by 79 authors encompassing 236
issues.

We recommend that all users upgrade to this version.

*Highlights:*

   - Drop support for numpy < 1.7.0
   - The Categorical type was integrated as a first-class pandas type
   - New scalar type Timedelta, and a new index type TimedeltaIndex
   - New DataFrame default display for df.info() to include memory usage
   - New datetimelike properties accessor .dt for Series
   - Split indexing documentation into Indexing and Selecting Data and
   MultiIndex / Advanced Indexing
   - Split out string methods documentation into Working with Text Data
   - read_csv will now by default ignore blank lines when parsing
   - API change in using Indexes in set operations
   - Internal refactoring of the Index class to no longer sub-class ndarray
   - dropping support for PyTables less than version 3.0.0, and numexpr
   less than version 2.1

See a full description of Whatsnew for v0.15.0 here:
http://pandas.pydata.org/pandas-docs/stable/whatsnew.html


*What is it:*

*pandas* is a Python package providing fast, flexible, and expressive data
structures designed to make working with “relational” or “labeled” data both
easy and intuitive. It aims to be the fundamental high-level building block
for
doing practical, real world data analysis in Python. Additionally, it has
the
broader goal of becoming the most powerful and flexible open source data
analysis / manipulation tool available in any language.


Documentation:
http://pandas.pydata.org/pandas-docs/stable/

Source tarballs, windows binaries are available on PyPI:
https://pypi.python.org/pypi/pandas

windows binaries are courtesy of  Christoph Gohlke and are built on Numpy
1.8
macosx wheels are courtesy of Matthew Brett and are built on Numpy 1.7.1

Please report any issues here:
https://github.com/pydata/pandas/issues


Thanks

The Pandas Development Team


Contributors to the 0.15.0 release
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Memory efficient alternative for np.loadtxt and np.genfromtxt

2014-10-26 Thread Jeff Reback

you should have a read here/
http://wesmckinney.com/blog/?p=543

going below the 2x memory usage on read in is non trivial and costly in terms 
of performance 

> On Oct 26, 2014, at 4:46 AM, Saullo Castro  wrote:
> 
> I would like to start working on a memory efficient alternative for 
> np.loadtxt and np.genfromtxt that uses arrays instead of lists to store the 
> data while the file iterator is exhausted.
> 
> The motivation came from this SO question:
> 
> http://stackoverflow.com/q/26569852/832621
> 
> where for huge arrays the current NumPy ASCII readers are really slow and 
> require ~6 times more memory. This case I tested with Pandas' read_csv() and 
> it required 2 times more memory.
> 
> I would be glad if you could share your experience on this matter.
> 
> Greetings,
> Saullo
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Memory efficient alternative for np.loadtxt and np.genfromtxt

2014-10-26 Thread Jeff Reback

you are describing a special case where you know the data size apriori (eg not 
streaming), dtypes are readily apparent from a small sample case 
and in general your data is not messy 

I would agree if these can be satisfied then you can achieve closer to a 1x 
memory overhead

using bcolZ is great but prob not a realistic option for a dependency for numpy 
(you should prob just memory map it directly instead); though this has a big 
perf impact - so need to weigh these things

not all cases deserve the same treatment - chunking is often the best option 
IMHO - provides a constant memory usage (though ultimately still 2x); but 
combined with memory mapping can provide a fixed resource utilization 

> On Oct 26, 2014, at 9:41 AM, Daπid  wrote:
> 
> 
>> On 26 October 2014 12:54, Jeff Reback  wrote:
>> you should have a read here/
>> http://wesmckinney.com/blog/?p=543
>> 
>> going below the 2x memory usage on read in is non trivial and costly in 
>> terms of performance 
> 
> 
> If you know in advance the number of rows (because it is in the header, 
> counted with wc -l, or any other prior information) you can preallocate the 
> array and fill in the numbers as you read, with virtually no overhead.
> 
> If the number of rows is unknown, an alternative is to use a chunked data 
> container like Bcolz [1] (former carray) instead of Python structures. It may 
> be used as such, or copied back to a ndarray if we want the memory to be 
> aligned. Including a bit of compression we can get the memory overhead to 
> somewhere under 2x (depending on the dataset), at the cost of not so much CPU 
> time, and this could be very useful for large data and slow filesystems. 
> 
> 
> /David.
> 
> [1] http://bcolz.blosc.org/
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] ANN: pandas v0.15.2

2014-12-12 Thread Jeff Reback

Hello,

We are proud to announce v0.15.2 of pandas, a minor release from 0.15.1.

This release includes a small number of API changes, several new features,
enhancements, and performance improvements along with a large number of bug
fixes.

This was a short release of 4 weeks with 137 commits by 49 authors
encompassing 75 issues.

We recommend that all users upgrade to this version.

For a more full description of Whatsnew for v0.15.2, see here:
http://pandas.pydata.org/pandas-docs/stable/whatsnew.html

*What is it:*

*pandas* is a Python package providing fast, flexible, and expressive data
structures designed to make working with “relational” or “labeled” data both
easy and intuitive. It aims to be the fundamental high-level building block
for
doing practical, real world data analysis in Python. Additionally, it has
the
broader goal of becoming the most powerful and flexible open source data
analysis / manipulation tool available in any language.


Documentation:
http://pandas.pydata.org/pandas-docs/stable/

Source tarballs, windows binaries are available on PyPI:
https://pypi.python.org/pypi/pandas

windows binaries are courtesy of  Christoph Gohlke and are built on Numpy
1.8
macosx wheels are courtesy of Matthew Brett

Please report any issues here:
https://github.com/pydata/pandas/issues


Thanks

The Pandas Development Team


Contributors to the 0.15.2 release


   - Aaron Staple
   - Angelos Evripiotis
   - Artemy Kolchinsky
   - Benoit Pointet
   - Brian Jacobowski
   - Charalampos Papaloizou
   - Chris Warth
   - David Stephens
   - Fabio Zanini
   - Francesc Via
   - Henry Kleynhans
   - Jake VanderPlas
   - Jan Schulz
   - Jeff Reback
   - Jeff Tratner
   - Joris Van den Bossche
   - Kevin Sheppard
   - Matt Suggit
   - Matthew Brett
   - Phillip Cloud
   - Rupert Thompson
   - Scott E Lasley
   - Stephan Hoyer
   - Stephen Simmons
   - Sylvain Corlay
   - Thomas Grainger
   - Tiago Antao
   - Trent Hauck
   - Victor Chaves
   - Victor Salgado
   - Vikram Bhandoh
   - WANG Aiyong
   - Will Holmgren
   - behzad nouri
   - broessli
   - charalampos papaloizou
   - immerrr
   - jnmclarty
   - jreback
   - mgilbert
   - onesandzeroes
   - peadarcoyle
   - rockg
   - seth-p
   - sinhrks
   - unutbu
   - wavedatalab
   - Åsmund Hjulstad
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Pandas v0.16.0 release candidate 1

2015-03-13 Thread Jeff Reback

Hi,

I'm pleased to announce the availability of the first release candidate of
Pandas 0.16.0.
Please try this RC and report any issues here: Pandas Issues
<https://github.com/pydata/pandas/issues>
We will be releasing officially in 1 week or so.

This is a major release from 0.15.2 and includes a small number of API
changes, several new features, enhancements, and performance improvements
along with a large number of bug fixes. We recommend that all users upgrade
to this version.

   - Highlights include:
  - DataFrame.assign method, see *here*
  
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0160-enhancements-assign>
  - Series.to_coo/from_coo methods to interact with scipy.sparse, see
  *here*
  
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0160-enhancements-sparse>
  - Backwards incompatible change to Timedelta to conform the
.seconds attribute
  with datetime.timedelta, see *here*
  
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0160-api-breaking-timedelta>
  - Changes to the .loc slicing API to conform with the behavior of .ix
   see *here
  
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#indexing-changes>*
  - Changes to the default for ordering in the Categorical constructor,
  see *here
  
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0160-api-breaking-categorical>*


Here are the full whatsnew and documentation links:
v0.16.0 Whatsnew
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html>

Source tarballs, windows builds, and mac wheels are available here:

Pandas v0.16.0rc1 Release <https://github.com/pydata/pandas/releases>

A big thank you to everyone who contributed to this release!

Jeff
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] ANN: pandas 0.16.0 released

2015-03-23 Thread Jeff Reback

Hello,

We are proud to announce v0.16.0 of pandas, a major release from 0.15.2.

This release includes a small number of API changes, several new features,
enhancements, and performance improvements along with a large number of bug
fixes.

This was 4 months of work by 60 authors encompassing 204 issues.

We recommend that all users upgrade to this version.

*Highlights:*

   -
   - *DataFrame.assign* method, see here
   

   - *Series.to_coo/from_coo* methods to interact with *scipy.sparse*, see
   here
   

   - Backwards incompatible change to *Timedelta* to conform the
*.seconds* attribute
   with *datetime.timedelta*, see here
   

   - Changes to the *.loc* slicing API to conform with the behavior of *.ix*
   see here
   

   - Changes to the default for ordering in the *Categorical* constructor,
   see here
   

   - Enhancement to the *.str* accessor to make string operations easier,
   see here
   

   - The *pandas.tools.rplot*, *pandas.sandbox.qtpandas* and
*pandas.rpy* modules
   are deprecated.
   - We refer users to external packages like seaborn
   , pandas-qt
    and rpy2
    for similar or equivalent functionality,
   see here for more detail
   



See a full description of the Whatsnew for v0.16.0



*What is it:*

*pandas* is a Python package providing fast, flexible, and expressive data
structures designed to make working with “relational” or “labeled” data both
easy and intuitive. It aims to be the fundamental high-level building block
for
doing practical, real world data analysis in Python. Additionally, it has
the
broader goal of becoming the most powerful and flexible open source data
analysis / manipulation tool available in any language.


Documentation:
http://pandas.pydata.org/pandas-docs/stable/

Source tarballs, windows wheels, macosx wheels are available on PyPI:
https://pypi.python.org/pypi/pandas

windows binaries are courtesy of  Christoph Gohlke and are built on Numpy
1.9
macosx wheels are courtesy of Matthew Brett and are built on Numpy 1.7.1

Please report any issues here:
https://github.com/pydata/pandas/issues


Thanks

The Pandas Development Team
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] ANN: pandas 0.16.1 released

2015-05-11 Thread Jeff Reback

Hello,

We are proud to announce v0.16.1 of pandas, a minor release from 0.16.0.

This release includes a small number of API changes, several new features,
enhancements, and performance improvements along with a large number of bug
fixes.

This was a release of 7 weeks with 222 commits by 57 authors encompassing
85 issues.

We recommend that all users upgrade to this version.

*What is it:*

*pandas* is a Python package providing fast, flexible, and expressive data
structures designed to make working with “relational” or “labeled” data both
easy and intuitive. It aims to be the fundamental high-level building block
for
doing practical, real world data analysis in Python. Additionally, it has
the
broader goal of becoming the most powerful and flexible open source data
analysis / manipulation tool available in any language.

Highlights of this release include:

   - Support for *CategoricalIndex*, a category based index, see here
   
<http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#whatsnew-0161-enhancements-categoricalindex>
   - New section on how-to-contribute to *pandas*, see here
   <http://pandas.pydata.org/pandas-docs/stable/contributing.html>
   - Revised "Merge, join, and concatenate" documentation, including
   graphical examples to make it easier to understand each operations, see
   here <http://pandas.pydata.org/pandas-docs/stable/merging.html>
   - New method *sample* for drawing random samples from Series, DataFrames
   and Panels. See here
   
<http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#whatsnew-0161-enhancements-sample>
   - The default *Index* printing has changed to a more uniform format, see
   here
   
<http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#whatsnew-0161-index-repr>
   - *BusinessHour* datetime-offset is now supported, see here
   <http://pandas.pydata.org/pandas-docs/stable/timeseries.html#business-hour>
   - Further enhancement to the *.str* accessor to make string operations
   easier, see here
   
<http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#whatsnew-0161-enhancements-string>


See the Whatsnew in v0.16.1
<http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#v0-16-1-may-11-2015>

Documentation:
http://pandas.pydata.org/pandas-docs/stable/

Source tarballs, windows binaries are available on PyPI:
https://pypi.python.org/pypi/pandas

windows binaries are courtesy of  Christoph Gohlke and are built on Numpy
1.8
macosx wheels are courtesy of Matthew Brett

Please report any issues here:
https://github.com/pydata/pandas/issues


Thanks

The Pandas Development Team


Contributors to the 0.16.1 release

   -
   - Alfonso MHC
   - Andy Hayden
   - Artemy Kolchinsky
   - Chris Gilmer
   - Chris Grinolds
   - Dan Birken
   - David BROCHART
   - David Hirschfeld
   - David Stephens
   - Dr. Leo
   - Evan Wright
   - Frans van Dunné
   - Hatem Nassrat
   - Henning Sperr
   - Hugo Herter
   - Jan Schulz
   - Jeff Blackburne
   - Jeff Reback
   - Jim Crist
   - Jonas Abernot
   - Joris Van den Bossche
   - Kerby Shedden
   - Leo Razoumov
   - Manuel Riel
   - Mortada Mehyar
   - Nick Burns
   - Nick Eubank
   - Olivier Grisel
   - Phillip Cloud
   - Pietro Battiston
   - Roy Hyunjin Han
   - Sam Zhang
   - Scott Sanderson
   - Stephan Hoyer
   - Tiago Antao
   - Tom Ajamian
   - Tom Augspurger
   - Tomaz Berisa
   - Vikram Shirgur
   - Vladimir Filimonov
   - William Hogman
   - Yasin A
   - Younggun Kim
   - behzad nouri
   - dsm054
   - floydsoft
   - flying-sheep
   - gfr
   - jnmclarty
   - jreback
   - ksanghai
   - lucas
   - mschmohl
   - ptype
   - rockg
   - scls19fr
   - sinhrks
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] ANN: pandas v0.16.2 released

2015-06-13 Thread Jeff Reback

Hello,

We are proud to announce v0.16.2 of pandas, a minor release from 0.16.1.

This release includes a small number of API changes, several new features,
enhancements, and performance improvements along with a large number of bug
fixes.

This was a release of 4 weeks with 105 commits by 32 authors encompassing
48 issues and 71 pull-requests.

We recommend that all users upgrade to this version.

*What is it:*

*pandas* is a Python package providing fast, flexible, and expressive data
structures designed to make working with “relational” or “labeled” data both
easy and intuitive. It aims to be the fundamental high-level building block
for
doing practical, real world data analysis in Python. Additionally, it has
the
broader goal of becoming the most powerful and flexible open source data
analysis / manipulation tool available in any language.

Highlights of this release include:

   -
   - A new *pipe* method, see here
   
<http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#whatsnew-0162-enhancements-pipe>
   - Documentation on how to use numba <http://numba.pydata.org> with
   *pandas*, see here
   
<http://pandas.pydata.org/pandas-docs/stable/enhancingperf.html#enhancingperf-numba>

See the Whatsnew in v0.16.2
<http://pandas.pydata.org/pandas-docs/stable/whatsnew.html>

Documentation:
http://pandas.pydata.org/pandas-docs/stable/

Source tarballs, windows binaries are available on PyPI:
https://pypi.python.org/pypi/pandas

windows binaries are courtesy of  Christoph Gohlke and are built on Numpy
1.9
macosx wheels are courtesy of Matthew Brett

Please report any issues here:
https://github.com/pydata/pandas/issues


Thanks

The Pandas Development Team


Contributors to the 0.16.2 release


   - Andrew Rosenfeld
   - Artemy Kolchinsky
   - Bernard Willers
   - Christer van der Meeren
   - Christian Hudon
   - Constantine Glen Evans
   - Daniel Julius Lasiman
   - Evan Wright
   - Francesco Brundu
   - Gaëtan de Menten
   - Jake VanderPlas
   - James Hiebert
   - Jeff Reback
   - Joris Van den Bossche
   - Justin Lecher
   - Ka Wo Chen
   - Kevin Sheppard
   - Mortada Mehyar
   - Morton Fox
   - Robin Wilson
   - Thomas Grainger
   - Tom Ajamian
   - Tom Augspurger
   - Yoshiki Vázquez Baeza
   - Younggun Kim
   - austinc
   - behzad nouri
   - jreback
   - lexual
   - rekcahpassyla
   - scls19fr
   - sinhrks
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Video meeting this week

2015-06-30 Thread Jeff Reback

you guys have an agenda?

I can be reached on my cell 917-971-6387

> On Jun 30, 2015, at 12:58 AM, Nathaniel Smith  wrote:
> 
>> On Fri, Jun 26, 2015 at 2:32 AM, Nathaniel Smith  wrote:
>> Hi all,
>> 
>> In a week and a half, this is happening:
>> 
>>https://github.com/numpy/numpy/wiki/SciPy-2015-developer-meeting
>> 
>> It's somewhat short notice (my bad :-/), but I think it would be good
>> to have a short video meeting sometime this week as a kind of
>> "pre-meeting" -- to at least briefly go over the main issues we see
>> facing the project to prime the pump, get a better idea about what we
>> want to accomplish at the meeting itself, and gather some early
>> feedback from anyone who won't be able to make it to SciPy (we'll miss
>> you).
>> 
>> The obligatory doodle:
>>http://doodle.com/6b4s6thqt9xt4vnh
> 
> Okay, let's aim for:
> 
>   Thursday July 2 at 20:00 UTC.
> 
> I believe that's 1pm California / 4 pm New York / 9pm London / 10pm
> western Europe
> 
> And so far it looks like we'll be under the 10 person Google Hangouts
> limit, which I'm assuming is simpler for everybody, so let's assume
> we're doing that unless otherwise specified. (This does mean that I'd
> appreciate a quick email if you're planning on dialling in but haven't
> otherwise responded to the poll, though!)
> 
> -n
> 
> -- 
> Nathaniel J. Smith -- http://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] floats for indexing, reshape - too strict ?

2015-07-02 Thread Jeff Reback

FYI pandas followed the same pattern to deprecate float indexers (except for 
indexing in a Float64Index) about a year ago

see here: 
http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#whatsnew-0140-deprecations

> On Jul 2, 2015, at 9:18 PM,   
> wrote:
> 
> 
> 
>> On Thu, Jul 2, 2015 at 8:51 PM, Chris Barker - NOAA Federal 
>>  wrote:
>> Sent from my iPhone
>> 
>> >
>> > The disadvantage I see is, that some weirder calculations would possible
>> > work most of the times, but not always,
>> 
>> 
>> >  not sure if you can define a "tolerance"
>> > reasonable here unless it is exact.
>> 
>> You could use a relative tolerance, but you'd still have to set that.
>> Better to put that decision squarely in the user's hands.
>> 
>> > Though I guess you are right that
>> > `//` will also just round silently already.
>> 
>> Yes, but if it's in the user's code, it should be obvious -- and then
>> the user can choose to round, or floor, or ceiling
> 
> round, floor, ceil don't produce integers.
> 
> I'm writing library code, and I don't have control over what everyone does.
> 
> round, floor, ceil, and // might hide bugs or user mistakes, if we are 
> supposed to get something that is "like an int" but it's. 42.6 instead.
> 
> Josef
> https://en.wikipedia.org/wiki/Phrases_from_The_Hitchhiker%27s_Guide_to_the_Galaxy#Answer_to_the_Ultimate_Question_of_Life.2C_the_Universe.2C_and_Everything_.2842.29
> 
>  
>> 
>> -CHB
>> 
>> >
>> > - Sebastian
>> >
>> >>
>> >> for example
>> >>
>> >>
>> > 5.0 == 5
>> >> True
>> >>
>> >>
>> > np.ones(10 / 2)
>> >> array([ 1.,  1.,  1.,  1.,  1.])
>> > 10 / 2 == 5
>> >> True
>> >>
>> >>
>> >> or the python 2 version
>> >>
>> >>
>> > np.ones(10. / 2)
>> >> array([ 1.,  1.,  1.,  1.,  1.])
>> > 10. / 2 == 5
>> >> True
>> >>
>> >>
>> >> I'm using now 10 // 2, or int(10./2 + 1)   but this is unconditional
>> >> and doesn't raise if the numbers are not close or equal to an integer
>> >> (which would be a bug)
>> >>
>> >>
>> >>
>> >>
>> >> Josef
>> >>
>> >>
>> >>
>> >>
>> >> ___
>> >> NumPy-Discussion mailing list
>> >> NumPy-Discussion@scipy.org
>> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>> > ___
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion@scipy.org
>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] testing numpy with downstream testsuites (was: Re: Notes from the numpy dev meeting at scipy 2015)

2015-08-26 Thread Jeff Reback

Pandas has for quite a while has a travis build where we install numpy
master and then run our test suite.

e.g. here: https://travis-ci.org/pydata/pandas/jobs/77256007

Over the last year this has uncovered a couple of changes which affected
pandas (mainly using something deprecated which was turned off :)

This was pretty simple to setup. Note that this adds 2+ minutes to the
build (though our builds take a while anyhow so its not a big deal).



On Wed, Aug 26, 2015 at 7:14 AM, Matthew Brett 
wrote:

> Hi,
>
> On Wed, Aug 26, 2015 at 7:59 AM, Nathaniel Smith  wrote:
> > [Popping this off to its own thread to try and keep things easier to
> follow]
> >
> > On Tue, Aug 25, 2015 at 9:52 AM, Nathan Goldbaum 
> wrote:
> >>>   - Lament: it would be really nice if we could get more people to
> >>> test our beta releases, because in practice right now 1.x.0 ends
> >>> up being where we actually the discover all the bugs, and 1.x.1 is
> >>> where it actually becomes usable. Which sucks, and makes it
> >>> difficult to have a solid policy about what counts as a
> >>> regression, etc. Is there anything we can do about this?
> >>
> >> Just a note in here - have you all thought about running the test
> suites for
> >> downstream projects as part of the numpy test suite?
> >
> > I don't think it came up, but it's not a bad idea! The main problems I
> > can foresee are:
> > 1) Since we don't know the downstream code, it can be hard to
> > interpret test suite failures. OTOH for changes we're uncertain of we
> > already do often end up running some downstream test suites by hand,
> > so it can only be an improvement on that...
> > 2) Sometimes everyone including downstream agrees that breaking
> > something is actually a good idea and they should just deal, but what
> > do you do then?
> >
> > These both seem solvable though.
> >
> > I guess a good strategy would be to compile a travis-compatible wheel
> > of $PACKAGE version $latest-stable against numpy 1.x, and then in the
> > 1.(x+1) development period numpy would have an additional travis run
> > which, instead of running the numpy test suite, instead does:
> >   pip install .
> >   pip install $PACKAGE-$latest-stable.whl
> >   python -c 'import package; package.test()' # adjust as necessary
> > ? Where $PACKAGE is something like scipy / pandas / astropy / ...
> > matplotlib would be nice but maybe impractical...?
> >
> > Maybe someone else will have objections but it seems like a reasonable
> > idea to me. Want to put together a PR? Asides from fame and fortune
> > and our earnest appreciation, your reward is you get to make sure that
> > the packages you care about are included so that we break them less
> > often in the future ;-).
>
> One simple way to get going would be for the release manager to
> trigger a build from this repo:
>
> https://github.com/matthew-brett/travis-wheel-builder
>
> This build would then upload a wheel to:
>
> http://travis-wheels.scikit-image.org/
>
> The upstream packages would have a test grid which included an entry
> with something like:
>
> pip install -f http://travis-wheels.scikit-image.org --pre numpy
>
> Cheers,
>
> Matthew
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] ANN: pandas v0.17.0rc1 - RELEASE CANDIDATE

2015-09-11 Thread Jeff Reback

Hi,

I'm pleased to announce the availability of the first release candidate of
Pandas 0.17.0.
Please try this RC and report any issues here: Pandas Issues
<https://github.com/pydata/pandas/issues/10848>
We will be releasing officially in 1-2 weeks or so.

**RELEASE CANDIDATE 1**

This is a major release from 0.16.2 and includes a small number of API
changes, several new features, enhancements, and performance improvements
along with a large number of bug fixes. We recommend that all users upgrade
to this version.

Highlights include:


   - Release the Global Interpreter Lock (GIL) on some cython operations,
   see here
   
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0170-gil>
   - Plotting methods are now available as attributes of the .plot
   accessor, see here
   
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0170-plot>
   - The sorting API has been revamped to remove some long-time
   inconsistencies, see here
   
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0170-api-breaking-sorting>
   - Support for a datetime64[ns] with timezones as a first-class dtype,
   see here
   
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0170-tz>
   - The default for to_datetime will now be to raise when presented with
   unparseable formats, previously this would return the original input, see
   here
   
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0170-api-breaking-to-datetime>
   - The default for dropna in HDFStore has changed to False, to store by
   default all rows even if they are all NaN, see here
   
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0170-api-breaking-hdf-dropna>
   - Support for Series.dt.strftime to generate formatted strings for
   datetime-likes, see here
   
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0170-strftime>
   - Development installed versions of pandas will now have PEP440
   compliant version strings GH9518
   <https://github.com/pydata/pandas/issues/9518>
   - Development support for benchmarking with the Air Speed Velocity
   library GH8316 <https://github.com/pydata/pandas/pull/8316>
   - Support for reading SAS xport files, see here
   
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0170-enhancements-sas-xport>
   - Removal of the automatic TimeSeries broadcasting, deprecated since
   0.8.0, see here
   
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0170-prior-deprecations>

See the Whatsnew
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html> for much
more information.

Best way to get this is to install via conda
<http://pandas-docs.github.io/pandas-docs-travis/install.html#installing-pandas-with-anaconda>
from
our development channel. Builds for osx-64,linux-64,win-64 for Python 2.7
and Python 3.4 are all available.

conda install pandas -c pandas

Thanks to all who made this release happen. It is a very large release!

Jeff
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] ANN: pandas v0.17.0rc2 - RELEASE CANDIDATE 2

2015-10-03 Thread Jeff Reback

Hi,

I'm pleased to announce the availability of the second release candidate of
Pandas 0.17.0.
Please try this RC and report any issues here: Pandas Issues
<https://github.com/pydata/pandas/issues/10848>
We will be releasing officially on October 9.

**RELEASE CANDIDATE 2**

>From RC 1 we have:


   - compat for Python 3.5
   - compat for matplotlib 1.5.0
   - .convert_objects is now restored to the original, and is deprecated

This is a major release from 0.16.2 and includes a small number of API
changes, several new features, enhancements, and performance improvements
along with a large number of bug fixes. We recommend that all users upgrade
to this version.

Highlights include:


   - Release the Global Interpreter Lock (GIL) on some cython operations,
   see here
   
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0170-gil>
   - Plotting methods are now available as attributes of the .plot
   accessor, see here
   
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0170-plot>
   - The sorting API has been revamped to remove some long-time
   inconsistencies, see here
   
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0170-api-breaking-sorting>
   - Support for a datetime64[ns] with timezones as a first-class dtype,
   see here
   
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0170-tz>
   - The default for to_datetime will now be to raise when presented with
   unparseable formats, previously this would return the original input, see
   here
   
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0170-api-breaking-to-datetime>
   - The default for dropna in HDFStore has changed to False, to store by
   default all rows even if they are all NaN, see here
   
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0170-api-breaking-hdf-dropna>
   - Support for Series.dt.strftime to generate formatted strings for
   datetime-likes, see here
   
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0170-strftime>
   - Development installed versions of pandas will now have PEP440
   compliant version strings GH9518
   <https://github.com/pydata/pandas/issues/9518>
   - Development support for benchmarking with the Air Speed Velocity
   library GH8316 <https://github.com/pydata/pandas/pull/8316>
   - Support for reading SAS xport files, see here
   
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0170-enhancements-sas-xport>
   - Removal of the automatic TimeSeries broadcasting, deprecated since
   0.8.0, see here
   
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0170-prior-deprecations>
   - Display format with plain text can optionally align with Unicode East
   Asian Width, see here
   
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0170-east-asian-width>
   - Compatibility with Python 3.5 GH11097
   <https://github.com/pydata/pandas/issues/11097>
   - Compatibility with matplotlib 1.5.0 GH1
   <https://github.com/pydata/pandas/issues/1>


See the Whatsnew
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html> for much
more information.

Best way to get this is to install via conda
<http://pandas-docs.github.io/pandas-docs-travis/install.html#installing-pandas-with-anaconda>
from
our development channel. Builds for osx-64,linux-64,win-64 for Python 2.7,
Python 3.4, and Python 3.5 (for osx/linux) are all available.

conda install pandas -c pandas

Thanks to all who made this release happen. It is a very large release!

Jeff
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] [pydata] ANN: pandas v0.17.0rc2 - RELEASE CANDIDATE 2

2015-10-05 Thread Jeff Reback

it should be exactly the same
(they are going to release soon as well I think) - with an updated version

> On Oct 5, 2015, at 2:25 PM, Big Stone  wrote:
> 
> hi,
> 
> on pypi, pandas_datareader (0.1.1)  is dated from April 10th.
> 
> Is it up-to-date with pandas 0.17rc2 ?
> 
>> On Sunday, October 4, 2015 at 7:36:26 AM UTC+2, Matthew Brett wrote:
>> Hi, 
>> 
>> On Sat, Oct 3, 2015 at 2:33 PM, Jeff Reback  wrote: 
>> > Hi, 
>> > 
>> > I'm pleased to announce the availability of the second release candidate 
>> > of 
>> > Pandas 0.17.0. 
>> > Please try this RC and report any issues here: Pandas Issues 
>> > We will be releasing officially on October 9. 
>> > 
>> > **RELEASE CANDIDATE 2** 
>> > 
>> > From RC 1 we have: 
>> > 
>> > compat for Python 3.5 
>> > compat for matplotlib 1.5.0 
>> > .convert_objects is now restored to the original, and is deprecated 
>> > 
>> > This is a major release from 0.16.2 and includes a small number of API 
>> > changes, several new features, enhancements, and performance improvements 
>> > along with a large number of bug fixes. We recommend that all users 
>> > upgrade 
>> > to this version. 
>> > 
>> > Highlights include: 
>> > 
>> > Release the Global Interpreter Lock (GIL) on some cython operations, see 
>> > here 
>> > Plotting methods are now available as attributes of the .plot accessor, 
>> > see 
>> > here 
>> > The sorting API has been revamped to remove some long-time 
>> > inconsistencies, 
>> > see here 
>> > Support for a datetime64[ns] with timezones as a first-class dtype, see 
>> > here 
>> > The default for to_datetime will now be to raise when presented with 
>> > unparseable formats, previously this would return the original input, see 
>> > here 
>> > The default for dropna in HDFStore has changed to False, to store by 
>> > default 
>> > all rows even if they are all NaN, see here 
>> > Support for Series.dt.strftime to generate formatted strings for 
>> > datetime-likes, see here 
>> > Development installed versions of pandas will now have PEP440 compliant 
>> > version strings GH9518 
>> > Development support for benchmarking with the Air Speed Velocity library 
>> > GH8316 
>> > Support for reading SAS xport files, see here 
>> > Removal of the automatic TimeSeries broadcasting, deprecated since 0.8.0, 
>> > see here 
>> > Display format with plain text can optionally align with Unicode East 
>> > Asian 
>> > Width, see here 
>> > Compatibility with Python 3.5 GH11097 
>> > Compatibility with matplotlib 1.5.0 GH1 
>> > 
>> > 
>> > See the Whatsnew for much more information. 
>> > 
>> > Best way to get this is to install via conda from our development channel. 
>> > Builds for osx-64,linux-64,win-64 for Python 2.7, Python 3.4, and Python 
>> > 3.5 
>> > (for osx/linux) are all available. 
>> > 
>> > conda install pandas -c pandas 
>> 
>> I built OSX wheels for Pythons 2.7, 3.4, 3.5. To test: 
>> 
>> pip install --pre -f http://wheels.scipy.org pandas 
>> 
>> There were some test failures for Python 3.3 - issue here: 
>> 
>> https://github.com/pydata/pandas/issues/11232 
>> 
>> Cheers, 
>> 
>> Matthew
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "PyData" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to pydata+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] ANN: pandas v0.17.0 released

2015-10-09 Thread Jeff Reback

Hi,

We are proud to announce v0.17.0 of pandas.

This is a major release from 0.16.2 and includes a small number of API
changes, several new features, enhancements, and performance improvements
along with a large number of bug fixes. We recommend that all users upgrade
to this version.

This was a release of 4 months with 515 commits by 112 authors encompassing
233 issues and 362 pull-requests.

We recommend that all users upgrade to this version.

*What is it:*

*pandas* is a Python package providing fast, flexible, and expressive data
structures designed to make working with “relational” or “labeled” data both
easy and intuitive. It aims to be the fundamental high-level building block
for
doing practical, real world data analysis in Python. Additionally, it has
the
broader goal of becoming the most powerful and flexible open source data
analysis / manipulation tool available in any language.

*Highlights*:


   - Release the Global Interpreter Lock (GIL) on some cython operations,
   see here
   
<http://pandas.pydata.org/pandas-docs/version/0.17.0/whatsnew.html#whatsnew-0170-gil>
   - Plotting methods are now available as attributes of the .plot
   accessor, see here
   
<http://pandas.pydata.org/pandas-docs/version/0.17.0/whatsnew.html#whatsnew-0170-plot>
   - The sorting API has been revamped to remove some long-time
   inconsistencies, see here
   
<http://pandas.pydata.org/pandas-docs/version/0.17.0/whatsnew.html#whatsnew-0170-api-breaking-sorting>
   - Support for a datetime64[ns] with timezones as a first-class dtype,
   see here
   
<http://pandas.pydata.org/pandas-docs/version/0.17.0/whatsnew.html#whatsnew-0170-tz>
   - The default for to_datetime will now be to raise when presented with
   unparseable formats, previously this would return the original input, see
   here
   
<http://pandas.pydata.org/pandas-docs/version/0.17.0/whatsnew.html#whatsnew-0170-api-breaking-to-datetime>
   - The default for dropna in HDFStore has changed to False, to store by
   default all rows even if they are all NaN, see here
   
<http://pandas.pydata.org/pandas-docs/version/0.17.0/whatsnew.html#whatsnew-0170-api-breaking-hdf-dropna>
   - Support for Series.dt.strftime to generate formatted strings for
   datetime-likes, see here
   
<http://pandas.pydata.org/pandas-docs/version/0.17.0/whatsnew.html#whatsnew-0170-strftime>
   - Development installed versions of pandas will now have PEP440
   compliant version strings GH9518
   <https://github.com/pydata/pandas/issues/9518>
   - Development support for benchmarking with the Air Speed Velocity
   library GH8316 <https://github.com/pydata/pandas/pull/8316>
   - Support for reading SAS xport files, see here
   
<http://pandas.pydata.org/pandas-docs/version/0.17.0/whatsnew.html#whatsnew-0170-enhancements-sas-xport>
   - Removal of the automatic TimeSeries broadcasting, deprecated since
   0.8.0, see here
   
<http://pandas.pydata.org/pandas-docs/version/0.17.0/whatsnew.html#whatsnew-0170-prior-deprecations>
   - Display format with plain text can optionally align with Unicode East
   Asian Width, see here
   
<http://pandas.pydata.org/pandas-docs/version/0.17.0/whatsnew.html#whatsnew-0170-east-asian-width>
   - Compatibility with Python 3.5 GH11097
   <https://github.com/pydata/pandas/issues/11097>
   - Compatibility with matplotlib 1.5.0 GH1
   <https://github.com/pydata/pandas/issues/1>

See the Whatsnew
<http://pandas.pydata.org/pandas-docs/version/0.17.0/whatsnew.html> for
much more information and the full Documentation
<http://pandas.pydata.org/pandas-docs/stable/> link.

*How to get it:*

Source tarballs, windows wheels, macosx wheels are available on PyPI
<https://pypi.python.org/pypi/pandas>

   - note that currently PyPi is not accepting 3.5 wheels.

Installation via conda is:

   - conda install pandas

windows wheels are courtesy of  Christoph Gohlke and are built on Numpy 1.9
macosx wheels are courtesy of Matthew Brett

*Issues:*

Please report any issues on our issue tracker
<https://github.com/pydata/pandas/issues>:


Thanks to all who made this release happen. It is a very large release!

Jeff

*Thanks to all of the contributors*


   - Alex Rothberg
   - Andrea Bedini
   - Andrew Rosenfeld
   - Andy Li
   - Anthonios Partheniou
   - Artemy Kolchinsky
   - Bernard Willers
   - Charlie Clark
   - Chris
   - Chris Whelan
   - Christoph Gohlke
   - Christopher Whelan
   - Clark Fitzgerald
   - Clearfield Christopher
   - Dan Ringwalt
   - Daniel Ni
   - Data & Code Expert Experimenting with Code on Data
   - David Cottrell
   - David John Gagne
   - David Kelly
   - ETF
   - Eduardo Schettino
   - Egor
   - Egor Panfilov
   - Evan Wright
   - Frank Pinter
   - Gabriel Araujo
   - Garrett-R
   - Gianluca Rossi
   - Guillaume Gay
   - Guillaume Poulin
   - Harsh Nisar
   - Ian Henriksen
   - Ian Hoegen
   - Jaidev Deshpande
   - Jan

Re: [Numpy-discussion] Make all comparisons with NaT false?

2015-10-13 Thread Jeff Reback

Here another oddity to add to the list

In [28]: issubclass(np.datetime64,np.integer)
Out[28]: False

In [29]: issubclass(np.timedelta64,np.integer)
Out[29]: True


On Tue, Oct 13, 2015 at 5:44 PM, Chris Barker  wrote:

> On Sun, Oct 11, 2015 at 8:38 PM, Stephan Hoyer  wrote:
>
>> Currently, NaT (not a time) does not have any special treatment when used
>> in comparison with datetime64/timedelta64 objects.
>>
>> To me, this seems a little crazy for a value meant to denote a
>> missing/invalid time -- NaT should really have the same comparison behavior
>> as NaN.
>>
>
> Yes, indeed.
>
>
>> Whether you call this an API change or a bug fix is somewhat of a
>> judgment call, but I believe this change is certainly consistent with the
>> goals of datetime64. It's also consistent with how NaT is used in pandas,
>> which uses its own wrappers around datetime64 precisely to fix these sorts
>> of issues.
>>
>
> Getting closer to Pandas is a Good Thing too...
>
>
>> So I'm raising this here to get some opinions on the right path forward:
>> 1. Is this a bug fix that we can backport to 1.10.x?
>> 2. Is this an API change that should wait until 1.11?
>> 3. Is this something where we need to start issuing warnings and
>> deprecate the existing behavior?
>>
>> My vote would be for option 2.
>>
>
> I agree.
>
> -CHB
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
>
> chris.bar...@noaa.gov
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] deprecate fromstring() for text reading?

2015-10-23 Thread Jeff Reback




> On Oct 23, 2015, at 6:13 PM, Charles R Harris  
> wrote:
> 
> 
> 
>> On Thu, Oct 22, 2015 at 5:47 PM, Chris Barker - NOAA Federal 
>>  wrote:
>> 
>>> I think it would be good to keep the usage to read binary data at least.
>> 
>> Agreed -- it's only the text file reading I'm proposing to deprecate. It was 
>> kind of weird to cram it in there in the first place.
>> 
>> Oh, fromfile() has the same issues.
>> 
>> Chris
>> 
>> 
>>> Or is there a good alternative to `np.fromstring(, dtype=...)`?  -- 
>>> Marten
>>> 
>>>> On Thu, Oct 22, 2015 at 1:03 PM, Chris Barker  
>>>> wrote:
>>>> There was just a question about a bug/issue with scipy.fromstring (which 
>>>> is numpy.fromstring) when used to read integers from a text file.
>>>> 
>>>> https://mail.scipy.org/pipermail/scipy-user/2015-October/036746.html
>>>> 
>>>> fromstring() is bugging and inflexible for reading text files -- and it is 
>>>> a very, very ugly mess of code. I dug into it a while back, and gave up -- 
>>>> just to much of a mess!
>>>> 
>>>> So we really should completely re-implement it, or deprecate it. I doubt 
>>>> anyone is going to do a big refactor, so that means deprecating it.
>>>> 
>>>> Also -- if we do want a fast read numbers from text files function (which 
>>>> would be nice, actually), it really should get a new name anyway.
>>>> 
>>>> (and the hopefully coming new dtype system would make it easier to write 
>>>> cleanly)
>>>> 
>>>> I'm not sure what deprecating something means, though -- have it raise a 
>>>> deprecation warning in the next version?
> 
> There was discussion at SciPy 2015 of separating out the text reading 
> abilities of Pandas so that numpy could include it. We should contact Jeff 
> Rebeck and see about moving that forward.
> 
> Chuck 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

IIRC Thomas Caswell was interested in doing this :)

Jeff___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] deprecate fromstring() for text reading?

2015-10-23 Thread Jeff Reback



> On Oct 23, 2015, at 6:49 PM, Nathaniel Smith  wrote:
> 
> On Oct 23, 2015 3:30 PM, "Jeff Reback"  wrote:
> >
> > On Oct 23, 2015, at 6:13 PM, Charles R Harris  
> > wrote:
> >
> >>
> >>
> >> On Thu, Oct 22, 2015 at 5:47 PM, Chris Barker - NOAA Federal 
> >>  wrote:
> >>>
> >>>
> >>>> I think it would be good to keep the usage to read binary data at least.
> >>>
> >>>
> >>> Agreed -- it's only the text file reading I'm proposing to deprecate. It 
> >>> was kind of weird to cram it in there in the first place.
> >>>
> >>> Oh, fromfile() has the same issues.
> >>>
> >>> Chris
> >>>
> >>>
> >>>> Or is there a good alternative to `np.fromstring(, dtype=...)`?  
> >>>> -- Marten
> >>>>
> >>>> On Thu, Oct 22, 2015 at 1:03 PM, Chris Barker  
> >>>> wrote:
> >>>>>
> >>>>> There was just a question about a bug/issue with scipy.fromstring 
> >>>>> (which is numpy.fromstring) when used to read integers from a text file.
> >>>>>
> >>>>> https://mail.scipy.org/pipermail/scipy-user/2015-October/036746.html
> >>>>>
> >>>>> fromstring() is bugging and inflexible for reading text files -- and it 
> >>>>> is a very, very ugly mess of code. I dug into it a while back, and gave 
> >>>>> up -- just to much of a mess!
> >>>>>
> >>>>> So we really should completely re-implement it, or deprecate it. I 
> >>>>> doubt anyone is going to do a big refactor, so that means deprecating 
> >>>>> it.
> >>>>>
> >>>>> Also -- if we do want a fast read numbers from text files function 
> >>>>> (which would be nice, actually), it really should get a new name anyway.
> >>>>>
> >>>>> (and the hopefully coming new dtype system would make it easier to 
> >>>>> write cleanly)
> >>>>>
> >>>>> I'm not sure what deprecating something means, though -- have it raise 
> >>>>> a deprecation warning in the next version?
> >>>>>
> >>
> >> There was discussion at SciPy 2015 of separating out the text reading 
> >> abilities of Pandas so that numpy could include it. We should contact Jeff 
> >> Rebeck and see about moving that forward.
> >
> >
> > IIRC Thomas Caswell was interested in doing this :)
> 
> When he was in Berkeley a few weeks ago he assured me that every night since 
> SciPy he has dutifully been feeling guilty about not having done it yet. I 
> think this week his paltry excuse is that he's "on his honeymoon" or 
> something.
> 
> ...which is to say that if someone has some spare cycles to take this over 
> then I think that might be a nice wedding present for him :-).
> 
> (The basic idea is to take the text reading backend behind pandas.read_csv 
> and extract it into a standalone package that pandas could depend on, and 
> that could also be used by other packages like numpy (among others -- I thing 
> dato's SFrame package has a fork of this code as well?))
> 
> -n
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

I can certainly provide guidance on how/what to extract but don't have spare 
cycles myself for this :(___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] ANN: pandas v0.17.1 Released

2015-11-21 Thread Jeff Reback

Hi,

We are proud to announce that *pandas* has become a sponsored project of
the NUMFocus organization
<http://numfocus.org/news/2015/10/09/numfocus-announces-new-fiscally-sponsored-project-pandas/>
This will help ensure the success of development of *pandas* as a
world-class open-source project.

This is a minor bug-fix release from 0.17.0 and includes a large number of
bug fixes along several new features, enhancements, and performance
improvements.
We recommend that all users upgrade to this version.

This was a release of 5 weeks with 176 commits by 61 authors encompassing
84 issues and 128 pull-requests.


*What is it:*

*pandas* is a Python package providing fast, flexible, and expressive data
structures designed to make working with “relational” or “labeled” data both
easy and intuitive. It aims to be the fundamental high-level building block
for
doing practical, real world data analysis in Python. Additionally, it has
the
broader goal of becoming the most powerful and flexible open source data
analysis / manipulation tool available in any language.

*Highlights*:


   - Support for Conditional HTML Formatting, see here
   
<http://pandas.pydata.org/pandas-docs/version/0.17.1/whatsnew.html#whatsnew-style>
   - Releasing the GIL on the csv reader & other ops, see here
   
<http://pandas.pydata.org/pandas-docs/version/0.17.1/whatsnew.html#whatsnew-performance>
   - Fixed regression in DataFrame.drop_duplicates from 0.16.2, causing
   incorrect results on integer values see Issue 11376


See the Whatsnew
<http://pandas.pydata.org/pandas-docs/version/0.17.1/whatsnew.html> for
much more information and the full Documentation
<http://pandas.pydata.org/pandas-docs/stable/> link.

*How to get it:*

Source tarballs, windows wheels, and macosx wheels are available on PyPI
<https://pypi.python.org/pypi/pandas>

Installation via conda is:

   - conda install pandas

windows wheels are courtesy of  Christoph Gohlke and are built on Numpy 1.9
macosx wheels are courtesy of Matthew Brett

*Issues:*

Please report any issues on our issue tracker
<https://github.com/pydata/pandas/issues>:

Jeff

*Thanks to all of the contributors*
































































* - Aleksandr Drozd - Alex Chase - Anthonios Partheniou - BrenBarn - Brian
J. McGuirk - Chris - Christian Berendt - Christian Perez - Cody Piersall -
Data & Code Expert Experimenting with Code on Data - DrIrv - Evan Wright -
Guillaume Gay - Hamed Saljooghinejad - Iblis Lin - Jake VanderPlas - Jan
Schulz - Jean-Mathieu Deschenes - Jeff Reback - Jimmy Callin - Joris Van
den Bossche - K.-Michael Aye - Ka Wo Chen - Loïc Séguin-C - Luo Yicheng -
Magnus Jöud - Manuel Leonhardt - Matthew Gilbert - Maximilian Roos -
Michael - Nicholas Stahl - Nicolas Bonnotte - Pastafarianist - Petra Chong
- Phil Schaf - Philipp A - Rob deCarvalho - Roman Khomenko - Rémy Léone -
Sebastian Bank - Thierry Moisan - Tom Augspurger - Tux1 - Varun - Wieland
Hoffmann - Winterflower - Yoav Ram - Younggun Kim - Zeke - ajcr - azuranski
- behzad nouri - cel4 - emilydolson - hironow - lexual - ll - rockg
- silentquasar - sinhrks - taeold *
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] When to stop supporting Python 2.6?

2015-12-03 Thread Jeff Reback

pandas is going to drop
2.6 and 3.3 next release at end of Jan 

(3.2 dropped in 0.17, in October)



I can be reached on my cell 917-971-6387
> On Dec 3, 2015, at 6:00 PM, Bryan Van de Ven  wrote:
> 
> 
>> On Dec 3, 2015, at 4:59 PM, Eric Firing  wrote:
>> 
>> Chuck,
>> 
>> I would support dropping the old versions now.  As a related data point, 
>> matplotlib is testing master on 2.7, 3.4, and 3.5--no more 2.6 and 3.3.
> 
> Ditto for Bokeh. 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Numpy 1.11.0b2 released

2016-01-30 Thread Jeff Reback

just my 2c

it's fairly straightforward to add a test to the Travis matrix to grab numpy 
wheels built numpy wheels (works for conda or pip installs). 

so in pandas we r testing 2.7/3.5 against numpy master continuously

https://github.com/pydata/pandas/blob/master/ci/install-3.5_NUMPY_DEV.sh

> On Jan 30, 2016, at 1:16 PM, Nathaniel Smith  wrote:
> 
> On Jan 30, 2016 9:27 AM, "Ralf Gommers"  wrote:
> >
> >
> >
> > On Fri, Jan 29, 2016 at 11:39 PM, Nathaniel Smith  wrote:
> >>
> >> It occurs to me that the best solution might be to put together a 
> >> .travis.yml for the release branches that does: "for pkg in 
> >> IMPORTANT_PACKAGES: pip install $pkg; python -c 'import pkg; pkg.test()'"
> >> This might not be viable right now, but will be made more viable if pypi 
> >> starts allowing official Linux wheels, which looks likely to happen before 
> >> 1.12... (see PEP 513)
> >>
> >> On Jan 29, 2016 9:46 AM, "Andreas Mueller"  wrote:
> >> >
> >> > Is this the point when scikit-learn should build against it?
> >>
> >> Yes please!
> >>
> >> > Or do we wait for an RC?
> >>
> >> This is still all in flux, but I think we might actually want a rule that 
> >> says it can't become an RC until after we've tested scikit-learn (and a 
> >> list of similarly prominent packages). On the theory that RC means "we 
> >> think this is actually good enough to release" :-). OTOH I'm not sure the 
> >> alpha/beta/RC distinction is very helpful; maybe they should all just be 
> >> betas.
> >>
> >> > Also, we need a scipy build against it. Who does that?
> >>
> >> Like Julian says, it shouldn't be necessary. In fact using old builds of 
> >> scipy and scikit-learn is even better than rebuilding them, because it 
> >> tests numpy's ABI compatibility -- if you find you *have* to rebuild 
> >> something then we *definitely* want to know that.
> >>
> >> > Our continuous integration doesn't usually build scipy or numpy, so it 
> >> > will be a bit tricky to add to our config.
> >> > Would you run our master tests? [did we ever finish this discussion?]
> >>
> >> We didn't, and probably should... :-)
> >
> > Why would that be necessary if scikit-learn simply tests pre-releases of 
> > numpy as you suggested earlier in the thread (with --pre)?
> >
> > There's also https://github.com/MacPython/scipy-stack-osx-testing by the 
> > way, which could have scikit-learn and scikit-image added to it. 
> >
> > That's two options that are imho both better than adding more workload for 
> > the numpy release manager. Also from a principled point of view, packages 
> > should test with new versions of their dependencies, not the other way 
> > around.
> 
> Sorry, that was unclear. I meant that we should finish the discussion, not 
> that we should necessarily be the ones running the tests. "The discussion" 
> being this one:
> 
> https://github.com/numpy/numpy/issues/6462#issuecomment-148094591
> https://github.com/numpy/numpy/issues/6494
> 
> I'm not saying that the release manager necessarily should be running the 
> tests (though it's one option). But the 1.10 experience seems to indicate 
> that we need *some* process for the release manager to make sure that some 
> basic downstream testing has happened. Another option would be keeping a 
> checklist of downstream projects and making sure they've all checked in and 
> confirmed that they've run tests before making the release.
> 
> -n
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Numpy pull requests getting out of hand.

2016-01-31 Thread Jeff Reback

FYI also useful to simply close by time - say older than 6 months with a 
message for the writer to reopen if they want to work on it

then u don't get too many stale ones 

my 2c

> On Jan 31, 2016, at 2:10 PM, Charles R Harris  
> wrote:
> 
> Hi All,
> 
> There are now 130 open numpy pull requests and it seems almost impossible to 
> keep that number down. My personal decision is that I am going to ignore any 
> new enhancements for the next couple of months and only merge bug fixes, 
> tests, house keeping (style, docs, deprecations), and old PRs. I would also 
> request that other maintainers start looking a taking care of older PRs, 
> either cleaning them up and merging, or closing them.
> 
> Chuck
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] [Suggestion] Labelled Array

2016-02-13 Thread Jeff Reback

In [10]: pd.options.display.max_rows=10

In [13]: np.random.seed(1234)

In [14]: c = np.random.randint(0,32,size=10)

In [15]: v = np.arange(10)

In [16]: df = DataFrame({'v' : v, 'c' : c})

In [17]: df
Out[17]:
c  v
0  15  0
1  19  1
2   6  2
3  21  3
4  12  4
........
5   7  5
6   2  6
7  27  7
8  28  8
9   7  9

[10 rows x 2 columns]

In [19]: df.groupby('c').count()
Out[19]:
   v
c
0   3136
1   3229
2   3093
3   3121
4   3041
..   ...
27  3128
28  3063
29  3147
30  3073
31  3090

[32 rows x 1 columns]

In [20]: %timeit df.groupby('c').count()
100 loops, best of 3: 2 ms per loop

In [21]: %timeit df.groupby('c').mean()
100 loops, best of 3: 2.39 ms per loop

In [22]: df.groupby('c').mean()
Out[22]:
   v
c
0   49883.384885
1   50233.692165
2   48634.116069
3   50811.743992
4   50505.368629
..   ...
27  49715.349425
28  50363.501469
29  50485.395933
30  50190.155223
31  50691.041748

[32 rows x 1 columns]


On Sat, Feb 13, 2016 at 1:29 PM,  wrote:

>
>
> On Sat, Feb 13, 2016 at 1:01 PM, Allan Haldane 
> wrote:
>
>> Sorry, to reply to myself here, but looking at it with fresh eyes maybe
>> the performance of the naive version isn't too bad. Here's a comparison of
>> the naive vs a better implementation:
>>
>> def split_classes_naive(c, v):
>> return [v[c == u] for u in unique(c)]
>>
>> def split_classes(c, v):
>> perm = c.argsort()
>> csrt = c[perm]
>> div = where(csrt[1:] != csrt[:-1])[0] + 1
>> return [v[x] for x in split(perm, div)]
>>
>> >>> c = randint(0,32,size=10)
>> >>> v = arange(10)
>> >>> %timeit split_classes_naive(c,v)
>> 100 loops, best of 3: 8.4 ms per loop
>> >>> %timeit split_classes(c,v)
>> 100 loops, best of 3: 4.79 ms per loop
>>
>
> The usecases I recently started to target for similar things is 1 Million
> or more rows and 1 uniques in the labels.
> The second version should be faster for large number of uniques, I guess.
>
> Overall numpy is falling far behind pandas in terms of simple groupby
> operations. bincount and histogram (IIRC) worked for some cases but are
> rather limited.
>
> reduce_at looks nice for cases where it applies.
>
> In contrast to the full sized labels in the original post, I only know of
> applications where the labels are 1-D corresponding to rows or columns.
>
> Josef
>
>
>
>>
>> In any case, maybe it is useful to Sergio or others.
>>
>> Allan
>>
>>
>> On 02/13/2016 12:11 PM, Allan Haldane wrote:
>>
>>> I've had a pretty similar idea for a new indexing function
>>> 'split_classes' which would help in your case, which essentially does
>>>
>>>  def split_classes(c, v):
>>>  return [v[c == u] for u in unique(c)]
>>>
>>> Your example could be coded as
>>>
>>>  >>> [sum(c) for c in split_classes(label, data)]
>>>  [9, 12, 15]
>>>
>>> I feel I've come across the need for such a function often enough that
>>> it might be generally useful to people as part of numpy. The
>>> implementation of split_classes above has pretty poor performance
>>> because it creates many temporary boolean arrays, so my plan for a PR
>>> was to have a speedy version of it that uses a single pass through v.
>>> (I often wanted to use this function on large datasets).
>>>
>>> If anyone has any comments on the idea (good idea. bad idea?) I'd love
>>> to hear.
>>>
>>> I have some further notes and examples here:
>>> https://gist.github.com/ahaldane/1e673d2fe6ffe0be4f21
>>>
>>> Allan
>>>
>>> On 02/12/2016 09:40 AM, Sérgio wrote:
>>>
 Hello,

 This is my first e-mail, I will try to make the idea simple.

 Similar to masked array it would be interesting to use a label array to
 guide operations.

 Ex.:
  >>> x
 labelled_array(data =
   [[0 1 2]
   [3 4 5]
   [6 7 8]],
  label =
   [[0 1 2]
   [0 1 2]
   [0 1 2]])

  >>> sum(x)
 array([9, 12, 15])

 The operations would create a new axis for label indexing.

 You could think of it as a collection of masks, one for each label.

 I don't know a way to make something like this efficiently without a
 loop. Just wondering...

 Sérgio.


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 https://mail.scipy.org/mailman/listinfo/numpy-discussion


>>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/n

Re: [Numpy-discussion] [Suggestion] Labelled Array

2016-02-13 Thread Jeff Reback

These operations get slower as the number of groups increase, but with a
faster function (e.g. the standard ones which are cythonized), the constant
on
the increase is pretty low.

In [23]: c = np.random.randint(0,1,size=10)

In [24]: df = DataFrame({'v' : v, 'c' : c})

In [25]: %timeit df.groupby('c').count()
100 loops, best of 3: 3.18 ms per loop

In [26]: len(df.groupby('c').count())
Out[26]: 1

In [27]: df.groupby('c').count()
Out[27]:
   v
c
0  9
1 11
2  7
3  8
4 16
...   ..
9995  11
9996  13
9997  13
9998   7
  10

[1 rows x 1 columns]


On Sat, Feb 13, 2016 at 1:39 PM, Jeff Reback  wrote:

> In [10]: pd.options.display.max_rows=10
>
> In [13]: np.random.seed(1234)
>
> In [14]: c = np.random.randint(0,32,size=10)
>
> In [15]: v = np.arange(10)
>
> In [16]: df = DataFrame({'v' : v, 'c' : c})
>
> In [17]: df
> Out[17]:
> c  v
> 0  15  0
> 1  19  1
> 2   6  2
> 3  21  3
> 4  12  4
> ........
> 5   7  5
> 6   2  6
> 7  27  7
> 8  28  8
> 9   7  9
>
> [10 rows x 2 columns]
>
> In [19]: df.groupby('c').count()
> Out[19]:
>v
> c
> 0   3136
> 1   3229
> 2   3093
> 3   3121
> 4   3041
> ..   ...
> 27  3128
> 28  3063
> 29  3147
> 30  3073
> 31  3090
>
> [32 rows x 1 columns]
>
> In [20]: %timeit df.groupby('c').count()
> 100 loops, best of 3: 2 ms per loop
>
> In [21]: %timeit df.groupby('c').mean()
> 100 loops, best of 3: 2.39 ms per loop
>
> In [22]: df.groupby('c').mean()
> Out[22]:
>v
> c
> 0   49883.384885
> 1   50233.692165
> 2   48634.116069
> 3   50811.743992
> 4   50505.368629
> ..   ...
> 27  49715.349425
> 28  50363.501469
> 29  50485.395933
> 30  50190.155223
> 31  50691.041748
>
> [32 rows x 1 columns]
>
>
> On Sat, Feb 13, 2016 at 1:29 PM,  wrote:
>
>>
>>
>> On Sat, Feb 13, 2016 at 1:01 PM, Allan Haldane 
>> wrote:
>>
>>> Sorry, to reply to myself here, but looking at it with fresh eyes maybe
>>> the performance of the naive version isn't too bad. Here's a comparison of
>>> the naive vs a better implementation:
>>>
>>> def split_classes_naive(c, v):
>>> return [v[c == u] for u in unique(c)]
>>>
>>> def split_classes(c, v):
>>> perm = c.argsort()
>>> csrt = c[perm]
>>> div = where(csrt[1:] != csrt[:-1])[0] + 1
>>> return [v[x] for x in split(perm, div)]
>>>
>>> >>> c = randint(0,32,size=10)
>>> >>> v = arange(10)
>>> >>> %timeit split_classes_naive(c,v)
>>> 100 loops, best of 3: 8.4 ms per loop
>>> >>> %timeit split_classes(c,v)
>>> 100 loops, best of 3: 4.79 ms per loop
>>>
>>
>> The usecases I recently started to target for similar things is 1 Million
>> or more rows and 1 uniques in the labels.
>> The second version should be faster for large number of uniques, I guess.
>>
>> Overall numpy is falling far behind pandas in terms of simple groupby
>> operations. bincount and histogram (IIRC) worked for some cases but are
>> rather limited.
>>
>> reduce_at looks nice for cases where it applies.
>>
>> In contrast to the full sized labels in the original post, I only know of
>> applications where the labels are 1-D corresponding to rows or columns.
>>
>> Josef
>>
>>
>>
>>>
>>> In any case, maybe it is useful to Sergio or others.
>>>
>>> Allan
>>>
>>>
>>> On 02/13/2016 12:11 PM, Allan Haldane wrote:
>>>
>>>> I've had a pretty similar idea for a new indexing function
>>>> 'split_classes' which would help in your case, which essentially does
>>>>
>>>>  def split_classes(c, v):
>>>>  return [v[c == u] for u in unique(c)]
>>>>
>>>> Your example could be coded as
>>>>
>>>>  >>> [sum(c) for c in split_classes(label, data)]
>>>>  [9, 12, 15]
>>>>
>>>> I feel I've come across the need for such a function often enough that
>>>> it might be generally useful to people as part of numpy. The
>>>> implementation of split_classes above has pretty poor performance
>>>> because it creates many temporary boolean arrays, so my p

[Numpy-discussion] ANN: pandas v0.18.0rc1 - RELEASE CANDIDATE

2016-02-13 Thread Jeff Reback

Hi,

I'm pleased to announce the availability of the first release candidate of
Pandas 0.18.0.
Please try this RC and report any issues here: Pandas Issues
<https://github.com/pydata/pandas/issues>
We will be releasing officially in 1-2 weeks or so.

**RELEASE CANDIDATE 1**

This is a major release from 0.17.1 and includes a small number of API
changes, several new features,
enhancements, and performance improvements along with a large number of bug
fixes. We recommend that all
users upgrade to this version.

Highlights include:

   - pandas >= 0.18.0 will no longer support compatibility with Python
   version 2.6 GH7718 <https://github.com/pydata/pandas/issues/7718> or
   version 3.3 GH11273 <https://github.com/pydata/pandas/issues/11273>
   - Moving and expanding window functions are now methods on Series and
   DataFrame similar to .groupby like objects, see here
   
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0180-enhancements-moments>
   .
   - Adding support for a RangeIndex as a specialized form of the
Int64Index for
   memory savings, see here
   
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0180-enhancements-rangeindex>
   .
   - API breaking .resample changes to make it more .groupby like, see here
   
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0180-breaking-resample>
   - Removal of support for positional indexing with floats, which was
   deprecated since 0.14.0. This will now raise a TypeError, see here
   
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0180-float-indexers>
   - The .to_xarray() function has been added for compatibility with the xarray
   package <http://xarray.pydata.org/en/stable/> see here
   
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0180-enhancements-xarray>
   .
   - Addition of the .str.extractall() method
   
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0180-enhancements-extractall>,
   and API changes to the the .str.extract() method
   
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0180-enhancements-extract>,
   and the .str.cat() method
   
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0180-enhancements-strcat>
   - pd.test() top-level nose test runner is available GH4327
   <https://github.com/pydata/pandas/issues/4327>

See the Whatsnew
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html> for much
more information.

Best way to get this is to install via conda
<http://pandas-docs.github.io/pandas-docs-travis/install.html#installing-pandas-with-anaconda>
from
our development channel. Builds for osx-64,linux-64,win-64 for Python 2.7
and Python 3.5 are all available.

conda install pandas=v0.18.0rc1 -c pandas

Thanks to all who made this release happen. It is a very large release!

Jeff
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Suggestion: special-case np.array(range(...)) to be faster

2016-02-15 Thread Jeff Reback

just an FYI.

pandas implemented a RangeIndex in upcoming 0.18.0, mainly for memory
savings,
see here
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#range-index>,
similar to how python range/xrange work.

though there are substantial perf benefits, mainly with set operations, see
here
<https://github.com/pydata/pandas/blob/master/pandas/indexes/range.py#L274>
though didn't officially benchmark thes.

Jeff


On Mon, Feb 15, 2016 at 11:13 AM, Antony Lee 
wrote:

> Indeed:
>
> In [1]: class C:
> def __getitem__(self, i):
> if i < 10: return i
> else: raise IndexError
> def __len__(self):
> return 10
>...:
>
> In [2]: np.array(C())
> Out[2]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>
>
> (omitting __len__ results in the creation of an object array, consistently
> with the fact that the sequence protocol requires __len__).
> Meanwhile, I found a new way to segfault numpy :-)
>
> In [3]: class C:
> def __getitem__(self, i):
> if i < 10: return i
> else: raise IndexError
> def __len__(self):
> return 42
>...:
>
> In [4]: np.array(C())
> Fatal Python error: Segmentation fault
>
>
> 2016-02-15 0:10 GMT-08:00 Nathaniel Smith :
>
>> On Sun, Feb 14, 2016 at 11:41 PM, Antony Lee 
>> wrote:
>> > I wonder whether numpy is using the "old" iteration protocol (repeatedly
>> > calling x[i] for increasing i until StopIteration is reached?)  A quick
>> > timing shows that it is indeed slower.
>>
>> Yeah, I'm pretty sure that np.array doesn't know anything about
>> "iterable", just about "sequence" (calling x[i] for 0 <= i <
>> i.__len__()).
>>
>> (See Sequence vs Iterable:
>> https://docs.python.org/3/library/collections.abc.html)
>>
>> Personally I'd like it if we could eventually make it so np.array
>> specifically looks for lists and only lists, because the way it has so
>> many different fallbacks right now creates all confusion between which
>> objects are elements. Compare:
>>
>> In [5]: np.array([(1, 2), (3, 4)]).shape
>> Out[5]: (2, 2)
>>
>> In [6]: np.array([(1, 2), (3, 4)], dtype="i4,i4").shape
>> Out[6]: (2,)
>>
>> -n
>>
>> --
>> Nathaniel J. Smith -- https://vorpus.org
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] ANN: pandas v0.18.0rc1 - RELEASE CANDIDATE

2016-02-15 Thread Jeff Reback

https://github.com/pydata/pandas/releases/tag/v0.18.0rc1

On Mon, Feb 15, 2016 at 12:51 PM, Derek Homeier <
de...@astro.physik.uni-goettingen.de> wrote:

> On 14 Feb 2016, at 1:53 am, Jeff Reback  wrote:
> >
> > I'm pleased to announce the availability of the first release candidate
> of Pandas 0.18.0.
> > Please try this RC and report any issues here: Pandas Issues
> > We will be releasing officially in 1-2 weeks or so.
> >
> Thanks, looking forward to give this a try!
> Do you have a download link to the source for non-Conda users and
> packagers?
> Finding anything in the github source tarball repositories without having
> the exact
> path seems hopeless.
>
> Derek
>
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] ANN: pandas v0.18.0rc2 - RELEASE CANDIDATE

2016-03-09 Thread Jeff Reback

Hi,

I'm pleased to announce the availability of the second release candidate of
Pandas 0.18.0.
Please try this RC and report any issues here: Pandas Issues
<https://github.com/pydata/pandas/issues>. Compared to RC1, we have added
updated read_sas and fixed float indexing. We will be releasing officially
very shortly.

THIS IS NOT A PRODUCTION RELEASE

This is a major release from 0.17.1 and includes a small number of API
changes, several new features, enhancements,
and performance improvements along with a large number of bug fixes. We
recommend that all users upgrade to this version.

Highlights include:

   - pandas >= 0.18.0 will no longer support compatibility with Python
   version 2.6 GH7718 <https://github.com/pydata/pandas/issues/7718> or
   version 3.3 GH11273 <https://github.com/pydata/pandas/issues/11273>
   - Moving and expanding window functions are now methods on Series and
   DataFrame similar to .groupby like objects, see here
   
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0180-enhancements-moments>
   .
   - Adding support for a RangeIndex as a specialized form of the
Int64Index for
   memory savings, see here
   
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0180-enhancements-rangeindex>
   .
   - API breaking .resample changes to make it more .groupby like, see here
   
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0180-breaking-resample>
   - Removal of support for positional indexing with floats, which was
   deprecated since 0.14.0. This will now raise a TypeError, see here
   
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0180-float-indexers>
   - The .to_xarray() function has been added for compatibility with the xarray
   package <http://xarray.pydata.org/en/stable/> see here
   
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0180-enhancements-xarray>
   .
   - The read_sas() function has been enhanced to read sas7bdat files, see
   here
   
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0180-enhancements-sas>
   - Addition of the .str.extractall() method
   
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0180-enhancements-extractall>,
   and API changes to the the .str.extract() method
   
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0180-enhancements-extract>,
   and the .str.cat() method
   
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0180-enhancements-strcat>
   - pd.test() top-level nose test runner is available GH4327
   <https://github.com/pydata/pandas/issues/4327>

See the Whatsnew
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html> for much
more information.

Best way to get this is to install via conda
<http://pandas-docs.github.io/pandas-docs-travis/install.html#installing-pandas-with-anaconda>
from
our development channel. Builds for osx-64,linux-64,win-64 for Python 2.7
and Python 3.5 are all available.

conda install pandas=v0.18.0rc2 -c pandas

Thanks to all who made this release happen. It is a very large release!

Jeff
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] ANN: pandas v0.18.0 Final released

2016-03-12 Thread Jeff Reback

Hi,

This is a major release from 0.17.1 and includes a small number of API
changes,
several new features, enhancements, and performance improvements along with
a
large number of bug fixes. We recommend that all users upgrade to this
version.

This was a release of 3.5 months with 381 commits by 100 authors
encompassing 465 issues and 290 pull-requests.

*What is it:*

*pandas* is a Python package providing fast, flexible, and expressive data
structures designed to make working with “relational” or “labeled” data both
easy and intuitive. It aims to be the fundamental high-level building block
for
doing practical, real world data analysis in Python. Additionally, it has
the
broader goal of becoming the most powerful and flexible open source data
analysis / manipulation tool available in any language.

*Highlights*:

   - pandas >= 0.18.0 will no longer support compatibility with Python
   version 2.6 GH7718 <https://github.com/pydata/pandas/issues/7718> or
   version 3.3 GH11273 <https://github.com/pydata/pandas/issues/11273>
   - Moving and expanding window functions are now methods on Series and
   DataFrame similar to .groupby like objects, see here
   
<http://pandas.pydata.org/pandas-docs/version/0.18.0/whatsnew.html#whatsnew-0180-enhancements-moments>
   .
   - Adding support for a RangeIndex as a specialized form of the
Int64Index for
   memory savings, see here
   
<http://pandas.pydata.org/pandas-docs/version/0.18.0/whatsnew.html#whatsnew-0180-enhancements-rangeindex>
   .
   - API breaking .resample changes to make it more .groupby like, see here
   
<http://pandas.pydata.org/pandas-docs/version/0.18.0/whatsnew.html#whatsnew-0180-breaking-resample>
   - Removal of support for positional indexing with floats, which was
   deprecated since 0.14.0. This will now raise a TypeError, see here
   
<http://pandas.pydata.org/pandas-docs/version/0.18.0/whatsnew.html#whatsnew-0180-float-indexers>
   - The .to_xarray() function has been added for compatibility with the xarray
   package <http://xarray.pydata.org/en/stable/> see here
   
<http://pandas.pydata.org/pandas-docs/version/0.18.0/whatsnew.html#whatsnew-0180-enhancements-xarray>
   .
   - The read_sas() function has been enhanced to read sas7bdat files, see
   here
   
<http://pandas.pydata.org/pandas-docs/version/0.18.0/whatsnew.html#whatsnew-0180-enhancements-sas>
   - Addition of the .str.extractall() method
   
<http://pandas.pydata.org/pandas-docs/version/0.18.0/whatsnew.html#whatsnew-0180-enhancements-extractall>,
   and API changes to the the .str.extract() method
   
<http://pandas.pydata.org/pandas-docs/version/0.18.0/whatsnew.html#whatsnew-0180-enhancements-extract>,
   and the .str.cat() method
   
<http://pandas.pydata.org/pandas-docs/version/0.18.0/whatsnew.html#whatsnew-0180-enhancements-strcat>
   - pd.test() top-level nose test runner is available GH4327
   <https://github.com/pydata/pandas/issues/4327>

See the Whatsnew
<http://pandas.pydata.org/pandas-docs/version/0.18.0/whatsnew.html> for
much more information and the full Documentation
<http://pandas.pydata.org/pandas-docs/version/0.18.0/> link.

*How to get it:*

Source tarballs, windows wheels, and macosx wheels are available on PyPI
<https://pypi.python.org/pypi/pandas>

Installation via conda is:

   - conda install pandas

windows wheels are courtesy of Christoph Gohlke and are built on Numpy 1.10
macosx wheels are courtesy of Matthew Brett.

*Issues:*

Please report any issues on our issue tracker
<https://github.com/pydata/pandas/issues>:

Jeff

*Thanks to all of the contributors*


   - ARF
   - Alex Alekseyev
   - Andrew McPherson
   - Andrew Rosenfeld
   - Anthonios Partheniou
   - Anton I. Sipos
   - Ben
   - Ben North
   - Bran Yang
   - Chris
   - Chris Carroux
   - Christopher C. Aycock
   - Christopher Scanlin
   - Cody
   - Da Wang
   - Daniel Grady
   - Dorozhko Anton
   - Dr-Irv
   - Erik M. Bray
   - Evan Wright
   - Francis T. O'Donovan
   - Frank Cleary
   - Gianluca Rossi
   - Graham Jeffries
   - Guillaume Horel
   - Henry Hammond
   - Isaac Schwabacher
   - Jean-Mathieu Deschenes
   - Jeff Reback
   - Joe Jevnik
   - John Freeman
   - John Fremlin
   - Jonas Hoersch
   - Joris Van den Bossche
   - Joris Vankerschaver
   - Justin Lecher
   - Justin Lin
   - Ka Wo Chen
   - Keming Zhang
   - Kerby Shedden
   - Kyle
   - Marco Farrugia
   - MasonGallo
   - MattRijk
   - Matthew Lurie
   - Maximilian Roos
   - Mayank Asthana
   - Mortada Mehyar
   - Moussa Taifi
   - Navreet Gill
   - Nicolas Bonnotte
   - Paul Reiners
   - Philip Gura
   - Pietro Battiston
   - RahulHP
   - Randy Carnevale
   - Rinoc Johnson
   - Rishipuri
   - Sangmin Park
   - Scott E Lasley
   - Sereger13
   - Shannon Wang
   - Skipper Seabold
   - Thierry Moisan
   - Thomas A Caswell
   - Toby Dylan Hocking
   - Tom Augspurger
   - Travis
   - Trent Hauck
   - Tux1
   - Varun
   -

[Numpy-discussion] ANN: v0.18.1 pandas Released

2016-05-04 Thread Jeff Reback

This is a minor bug-fix release from 0.18.0 and includes a large number of
bug fixes along several new features, enhancements, and performance
improvements. We recommend that all users upgrade to this version.

This was a release of 6 weeks with 210 commits by 60 authors  encompassing
142 issues and 164 pull-requests.


*What is it:*

*pandas* is a Python package providing fast, flexible, and expressive
data structures
designed to make working with “relational” or “labeled” data both easy and
intuitive. It aims to be the fundamental high-level building block
for doing practical, real world data analysis in Python. Additionally, it
has the broader goal of becoming the most powerful and flexible open source
data analysis / manipulation tool available in any language.


*Highlights*:

   - .groupby(...) has been enhanced to provide convenient syntax when working
   with .rolling(..), .expanding(..) and .resample(..) per group, see here
   
<http://pandas.pydata.org/pandas-docs/version/0.18.1/whatsnew.html#whatsnew-0181-deferred-ops>
   - pd.to_datetime() has gained the ability to assemble dates from a
   DataFrame, see here
   
<http://pandas.pydata.org/pandas-docs/version/0.18.1/whatsnew.html#whatsnew-0181-enhancements-assembling>
   - Method chaining improvements, see here
   
<http://pandas.pydata.org/pandas-docs/version/0.18.1/whatsnew.html#whatsnew-0181-enhancements-method-chain>
   - Custom business hour offset, see here
   
<http://pandas.pydata.org/pandas-docs/version/0.18.1/whatsnew.html#whatsnew-0181-enhancements-custombusinesshour>
   - Many bug fixes in the handling of sparse, see here
   
<http://pandas.pydata.org/pandas-docs/version/0.18.1/whatsnew.html#whatsnew-0181-sparse>
   - Expanded the Tutorials section
   
<http://pandas.pydata.org/pandas-docs/version/0.18.1/tutorials.html#tutorial-modern>
with
   a feature on modern pandas, courtesy of @TomAugsburger
   <https://twitter.com/TomAugspurger>.

See the Whatsnew
<http://pandas.pydata.org/pandas-docs/version/0.18.1/whatsnew.html> for
much more information, and the full Documentation
<http://pandas.pydata.org/pandas-docs/stable/> link.


*How to get it:*

Source tarballs, windows wheels, and macosx wheels are available on PyPI
<https://pypi.python.org/pypi/pandas>. Windows wheels are courtesy of Christoph
Gohlke, and are built on Numpy 1.10. Macosx wheels are courtesy of Matthew
Brett.

Installation via conda is: conda install pandas
currently its available via the conda-forge channel: conda install pandas
-c conda-forge
It will be available on the main channel shortly.

Please report any issues on our issue tracker
<https://github.com/pydata/pandas/issues>:

Jeff Reback


*Thanks to all of the contributors*


* - Andrew Fiore-Gartland- Bastiaan- Benoît Vinot- Brandon Rhodes- DaCoEx-
Drew Fustin- Ernesto Freitas- Filip Ter- Gregory Livschitz- Gábor Lipták-
Hassan Kibirige- Iblis Lin- Israel Saeta Pérez- Jason Wolosonovich- Jeff
Reback- Joe Jevnik- Joris Van den Bossche- Joshua Storck- Ka Wo Chen- Kerby
Shedden- Kieran O'Mahony- Leif Walsh- Mahmoud Lababidi- Maoyuan Liu- Mark
Roth- Matt Wittmann- MaxU- Maximilian Roos- Michael Droettboom- Nick
Eubank- Nicolas Bonnotte- OXPHOS- Pauli Virtanen- Peter Waller- Pietro
Battiston- Prabhjot Singh- Robin Wilson- Roger Thomas- Sebastian Bank-
Stephen Hoover- Tim Hopper- Tom Augspurger- WANG Aiyong- Wes Turner-
Winand- Xbar- Yan Facai- adneu- ajenkins-cargometrics- behzad nouri-
chinskiy- gfyoung- jeps-journal- jonaslb- kotrfa- nileracecrew-
onesandzeroes- rs2- sinhrks- tsdlovell*
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Picking rows with the first (or last) occurrence of each key

2016-07-04 Thread Jeff Reback

This is trivial in pandas. a simple groupby.

In [6]: data = [[ 'a', 27, 14.5 ],['b', 12, 99.0],['a', 17, 100.3], ['b',
12, -329.0]]

In [7]: df = DataFrame(data, columns=list('ABC'))

In [8]: df
Out[8]:
   A   B  C
0  a  27   14.5
1  b  12   99.0
2  a  17  100.3
3  b  12 -329.0

In [9]: df.groupby('A').first()
Out[9]:
B C
A
a  27  14.5
b  12  99.0

In [10]: df.groupby('A').last()
Out[10]:
B  C
A
a  17  100.3
b  12 -329.0


On Mon, Jul 4, 2016 at 7:27 PM, Skip Montanaro 
wrote:

> > Any way that you can make your keys numeric? Then you can run np.diff on
> > that first column, and use the indices of nonzero entries
> (np.flatnonzero)
> > to know where values change. With a +1/-1 offset (that I am too lazy to
> > figure out right now ;) you can then index into the original rows to get
> > either the first or last occurrence of each run.
>
> I'll give it some thought, but one of the elements of the key is definitely
> a (short, < six characters) string.  Hashing it probably wouldn't work, too
> great a chance for collisions.
>
> S
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Problems with get_info installing scipy

2010-06-08 Thread Jeff Hsu

I tried to install scipy, but I get the error with not being able to find
get_info() from numpy.distutils.misc_util.  I read that you need the SVN
version of numpy to fix this.  I recompiled numpy and reinstalled from the
SVN, which says is version 1.3.0 (was using 1.4.1 version before) and that
function is not found within either versions.  What version of numpy should
I use? Or maybe I'm not removing numpy correctly.

Thanks!
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Problems with get_info installing scipy

2010-06-08 Thread Jeff Hsu

Thanks, that works.  Unfortunately it uncovered another problem.  When I try
and reinstall numpy, it keeps building with intel mkl libraries even when I
get a fresh install of numpy with the site.cfg set to default or no site.cfg
at all.

Giving me:
FOUND:
libraries = ['mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core',
'pthread']
library_dirs = ['/opt/intel/Compiler/11.1/072/mkl/lib/em64t']
define_macros = [('SCIPY_MKL_H', None)]
include_dirs = ['/opt/intel/Compiler/11.1/072/mkl/include']
...

On Tue, Jun 8, 2010 at 10:41 AM, Pauli Virtanen  wrote:

> Tue, 08 Jun 2010 09:47:41 -0400, Jeff Hsu wrote:
> > I tried to install scipy, but I get the error with not being able to
> > find get_info() from numpy.distutils.misc_util.  I read that you need
> > the SVN version of numpy to fix this.  I recompiled numpy and
> > reinstalled from the SVN, which says is version 1.3.0 (was using 1.4.1
> > version before) and that function is not found within either versions.
> > What version of numpy should I use? Or maybe I'm not removing numpy
> > correctly.
>
> It's included in 1.4.1 and in SVN (which is 1.5.x).
>
> You almost certainly have an older version of numpy installed somewhere
> that overrides the new one. Check "import numpy; print numpy.__file__" to
> see which one is imported.
>
> --
> Pauli Virtanen
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] updating NumPy in EPD

2010-06-08 Thread Jeff Hsu

Check which version of numpy python is importing with "import numpy; print
numpy.__file__".  I had a similar question and this worked after I removed
that installation of numpy.  I think the enthought distro installs it
somewhere else that has priority.


On Tue, Jun 8, 2010 at 10:30 PM, Nick Matzke  wrote:

> Hi NumPy gurus,
>
> I have a slightly weird question.  I would like to install
> the PyCogent python library.  However, this requires NumPy
> 1.3 or higher.  I only have NumPy 1.1.1, because I got it as
> part of the Enthought Python Distribution (4.1) back in 2008.
>
> Now, when I download & install a new version of NumPy, the
> install seems to work.  However, the PyCogent installer can
> still only see the NumPy 1.1.1 version.
>
> Any advice on what I might do to fix this?
>
> I would just update my version of EPD, which is
> where my NumPy came from -- however, Enthought only has
> available for academic download a version of EPD that works
> on OS X 10.5 or later, and my Mac is a 10.4.11 and I'd
> rather not completely reinstall the OS just to get one
> little library to work.
>
> Any help much appreciated!!
>
> Cheers!
> Nick
>
>
>
> --
> 
> Nicholas J. Matzke
> Ph.D. Candidate, Graduate Student Researcher
> Huelsenbeck Lab
> Center for Theoretical Evolutionary Genomics
> 4151 VLSB (Valley Life Sciences Building)
> Department of Integrative Biology
> University of California, Berkeley
>
> Graduate Student Instructor, IB200A
> Principles of Phylogenetics: Systematics
> http://ib.berkeley.edu/courses/ib200a/index.shtml
>
> Lab websites:
> http://ib.berkeley.edu/people/lab_detail.php?lab=54
> http://fisher.berkeley.edu/cteg/hlab.html
> Dept. personal page:
> http://ib.berkeley.edu/people/students/person_detail.php?person=370
> Lab personal page:
> http://fisher.berkeley.edu/cteg/members/matzke.html
> Lab phone: 510-643-6299
> Dept. fax: 510-643-6264
> Cell phone: 510-301-0179
> Email: mat...@berkeley.edu
>
> Mailing address:
> Department of Integrative Biology
> 3060 VLSB #3140
> Berkeley, CA 94720-3140
>
> -
> "[W]hen people thought the earth was flat, they were wrong.
> When people thought the earth was spherical, they were
> wrong. But if you think that thinking the earth is spherical
> is just as wrong as thinking the earth is flat, then your
> view is wronger than both of them put together."
>
> Isaac Asimov (1989). "The Relativity of Wrong." The
> Skeptical Inquirer, 14(1), 35-44. Fall 1989.
> http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm
> 
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] ragged array implimentation

2011-03-07 Thread Jeff Whitaker

On 3/7/11 10:28 AM, Christopher Barker wrote:
> Hi folks,
>
> I'm setting out to write some code to access and work with ragged arrays
> stored in netcdf files. It dawned on me that ragged arrays are not all
> that uncommon, so I'm wondering if any of you have any code you've
> developed that I could learn-from borrow from, etc.
>
> note that when I say a "ragged array", I mean a set of data where the
> each row could be a different arbitrary length:
>
> 1, 2, 3, 4
> 5, 6
> 7, 8, 9, 10, 11, 12
> 13, 14, 15
> ...
>
> In my case, these will only be 2-d, though I suppose one could have a
> n-d version where the last dimension was ragged (or any dimension, I
> suppose, though I'm having trouble wrapping my brain around what that
> would look like...
>
> I'm not getting more specific about what I think the API should look
> like -- that is part of what I'm looking for suggestions, previous
> implementations, etc for.
>
> Is there any "standard" way to work with such data?
>
> -Chris
>

Chris:  The netcdf4-python modules reads netcdf vlen arrays and returns 
numpy object arrays, where the elements of the object arrays are 
themselves 1d numpy arrays. I don't think there is any other way to do 
it.  In your example, the 'ragged' array would be a 1d numpy array with 
dtype='O', and the individual elements would be 1d numpy arrays with 
dtype=int.  Of course, these arrays are very awkward to deal with and 
operations will be slow.

-Jeff

-- 
Jeffrey S. Whitaker Phone  : (303)497-6313
Meteorologist   FAX: (303)497-6449
NOAA/OAR/PSD  R/PSD1Email  : jeffrey.s.whita...@noaa.gov
325 BroadwayOffice : Skaggs Research Cntr 1D-113
Boulder, CO, USA 80303-3328 Web: http://tinyurl.com/5telg

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] ragged array implimentation

2011-03-07 Thread Jeff Whitaker

On 3/7/11 11:42 AM, Christopher Barker wrote:
> On 3/7/11 9:33 AM, Francesc Alted wrote:
>> A Monday 07 March 2011 18:28:11 Christopher Barker escrigué:
>>> I'm setting out to write some code to access and work with ragged
>>> arrays stored in netcdf files. It dawned on me that ragged arrays
>>> are not all that uncommon, so I'm wondering if any of you have any
>>> code you've developed that I could learn-from borrow from, etc.
>> A list of numpy arrays would not be enough?  Or you want something more
>> specific?
> Maybe that would, but in mapping to the netcdf data model, I'm thinking
> more like a big 1-d numpy array, with a way to index into it. Also, that
> would allow you to do some math with the arrays, if the broad casting
> made sense, anyway.
>
> But now that you've entered the conversation, does HDF and/or pytables
> have a standard way of dealing with this?
>
> On 3/7/11 9:37 AM, Jeff Whitaker wrote:
>> Chris:  The netcdf4-python modules reads netcdf vlen arrays
> arethose a netcdf4 feature?
Chris:

Yes, although I don't think many people are using it.

See section 10 in 
http://netcdf4-python.googlecode.com/svn/trunk/docs/netCDF4-module.html.

> So far, I'm still workign with netcdf3 --
> though this could be a compelling reason to move on!
>
> We've talked about this some on the CF list, and I don't think anyone
> brought that up.
>
>> and returns
>> numpy object arrays, where the elements of the object arrays are
>> themselves 1d numpy arrays. I don't think there is any other way to do
>> it.  In your example, the 'ragged' array would be a 1d numpy array with
>> dtype='O', and the individual elements would be 1d numpy arrays with
>> dtype=int.  Of course, these arrays are very awkward to deal with and
>> operations will be slow.
> Depends some of the operations, but yes.
>
> That still may be the best option, but I'm exploring others.
>
> is a "vlen array" stored contiguously in netcdf?
Probably, although I don't know for sure what the underlying HDF5 layer 
is doing.

-Jeff
> -Chris
>
>
>
>


-- 
Jeffrey S. Whitaker Phone  : (303)497-6313
Meteorologist   FAX: (303)497-6449
NOAA/OAR/PSD  R/PSD1Email  : jeffrey.s.whita...@noaa.gov
325 BroadwayOffice : Skaggs Research Cntr 1D-113
Boulder, CO, USA 80303-3328 Web: http://tinyurl.com/5telg

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Having trouble installing Numpy on OS X

2009-01-24 Thread Jeff Whitaker

Nat Wilson wrote:
> Would anyone be willing to help me interpret an error while trying to  
> build and install Numpy? I've searched around, and haven't seen this  
> elsewhere.
>
> I've been running into this wall for about half the day now. I've  
> tried reinstalling Python, using numpy 1.2.0 and 1.2.1.
>
> I have Python 2.6.1, running on OS X 10.4.11, with a G4 PPC processor.
>
> Here's the print out:
>
> Ganymede:~/Desktop/numpy-1.2.1 username$ python setup.py build
> Running from numpy source directory.
> Traceback (most recent call last):
>File "setup.py", line 96, in 
>  setup_package()
>File "setup.py", line 68, in setup_package
>  from numpy.distutils.core import setup
>File "/Users/username/Desktop/numpy/numpy/distutils/__init__.py",  
> line 6, in 
>File "/Users/username/Desktop/numpy/numpy/distutils/ccompiler.py",  
> line 11, in 
>File "/Users/username/Desktop/numpy/numpy/distutils/log.py", line  
> 7, in 
>File "/Users/username/Desktop/numpy/numpy/distutils/misc_util.py",  
> line 8, in 
>File "/Library/Frameworks/Python.framework/Versions/2.6/lib/ 
> python2.6/tempfile.py", line 34, in 
>  from random import Random as _Random
>File "/Library/Frameworks/Python.framework/Versions/2.6/lib/ 
> python2.6/random.py", line 871, in 
>  _inst = Random()
>File "/Library/Frameworks/Python.framework/Versions/2.6/lib/ 
> python2.6/random.py", line 96, in __init__
>  self.seed(x)
>File "/Library/Frameworks/Python.framework/Versions/2.6/lib/ 
> python2.6/random.py", line 115, in seed
>  super(Random, self).seed(a)
> SystemError: error return without exception set
>
> Any ideas?
> I've had numpy/scipy installed in the past, but recently had to wipe  
> everything and start from scratch. Everything should be pretty clean  
> right now. Am I missing something obvious?
>
> Thanks,
> NJW
>
>   
numpy 1.2 doesn't work with python 2.6.  You'll either need to revert to 
python 2.5 or get the latest svn numpy (which still may have some python 
2.6 glitches).

-Jeff

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Having trouble installing Numpy on OS X

2009-01-24 Thread Jeff Whitaker

Nat Wilson wrote:
> Would anyone be willing to help me interpret an error while trying to  
> build and install Numpy? I've searched around, and haven't seen this  
> elsewhere.
>
> I've been running into this wall for about half the day now. I've  
> tried reinstalling Python, using numpy 1.2.0 and 1.2.1.
>
> I have Python 2.6.1, running on OS X 10.4.11, with a G4 PPC processor.
>
> Here's the print out:
>
> Ganymede:~/Desktop/numpy-1.2.1 username$ python setup.py build
> Running from numpy source directory.
> Traceback (most recent call last):
>File "setup.py", line 96, in 
>  setup_package()
>File "setup.py", line 68, in setup_package
>  from numpy.distutils.core import setup
>File "/Users/username/Desktop/numpy/numpy/distutils/__init__.py",  
> line 6, in 
>File "/Users/username/Desktop/numpy/numpy/distutils/ccompiler.py",  
> line 11, in 
>File "/Users/username/Desktop/numpy/numpy/distutils/log.py", line  
> 7, in 
>File "/Users/username/Desktop/numpy/numpy/distutils/misc_util.py",  
> line 8, in 
>File "/Library/Frameworks/Python.framework/Versions/2.6/lib/ 
> python2.6/tempfile.py", line 34, in 
>  from random import Random as _Random
>File "/Library/Frameworks/Python.framework/Versions/2.6/lib/ 
> python2.6/random.py", line 871, in 
>  _inst = Random()
>File "/Library/Frameworks/Python.framework/Versions/2.6/lib/ 
> python2.6/random.py", line 96, in __init__
>  self.seed(x)
>File "/Library/Frameworks/Python.framework/Versions/2.6/lib/ 
> python2.6/random.py", line 115, in seed
>  super(Random, self).seed(a)
> SystemError: error return without exception set
>
> Any ideas?
> I've had numpy/scipy installed in the past, but recently had to wipe  
> everything and start from scratch. Everything should be pretty clean  
> right now. Am I missing something obvious?
>
> Thanks,
> NJW
>   

Nat:

numpy 1.2.x doesn't work with python 2.6.  You'll either need to revert 
to python 2.5 or get the latest svn numpy (which still may have some 
python 2.6 glitches).

-Jeff
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Data file format choice.

2009-01-30 Thread Jeff Whitaker

Gary Pajer wrote:
> It's time for me to select a data format.
>
> My data are (more or less) spectra ( a couple of thousand samples), 
> six channels, each channel running around 10 Hz, collecting for a 
> minute or so. Plus all the settings on the instrument.
>
> I don't see any significant differences between netCDF4 and HDF5.   
Gary:  netCDF4 is just a thin wrapper on top of HDF5 1.8 - think of it 
as a higher level API.
> Similarly, I don't see significant differences between pytables and 
> h5py.  Does one play better with numpy?  
pytables has been around longer and is well-tested, has nice pythonic 
features, but files you write with it may not be readable by C or 
fortran clients.  h5py works only with python 2.5/2.6, and writes 
'vanilla' hdf5 files readable by anybody.
> What are the best numpy solutions for netCDF4?

There's only one that I know of - http://code.google.com/p/netcdf4-python.

-Jeff

-- 
Jeffrey S. Whitaker Phone  : (303)497-6313
Meteorologist   FAX: (303)497-6449
NOAA/OAR/PSD  R/PSD1Email  : jeffrey.s.whita...@noaa.gov
325 BroadwayOffice : Skaggs Research Cntr 1D-113
Boulder, CO, USA 80303-3328 Web: http://tinyurl.com/5telg

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Failure with 1.3.0b1 under Solaris 10 SPARC

2009-03-28 Thread Jeff Blaine

Same problem with 1.3.0rc1

Jeff Blaine wrote:
> Aside from this, the website for NumPy should have a link to the
> list subscription address, not a link to the list itself (which
> cannot be posted to unless one is a member).
> 
> Python 2.4.2 (#2, Dec  6 2006, 17:18:19)
> [GCC 3.3.5] on sunos5
> Type "help", "copyright", "credits" or "license" for more information.
>  >>> import numpy
> Traceback (most recent call last):
>File "", line 1, in ?
>File
> "/afs/.rcf.mitre.org/lang/python/sun4x_510/2.4.2/lib/python2.4/site-packages/numpy/__init__.py",
>  
> 
> line 130, in ?
>  import add_newdocs
>File
> "/afs/.rcf.mitre.org/lang/python/sun4x_510/2.4.2/lib/python2.4/site-packages/numpy/add_newdocs.py",
>  
> 
> line 9, in ?
>  from lib import add_newdoc
>File
> "/afs/.rcf.mitre.org/lang/python/sun4x_510/2.4.2/lib/python2.4/site-packages/numpy/lib/__init__.py",
>  
> 
> line 4, in ?
>  from type_check import *
>File
> "/afs/.rcf.mitre.org/lang/python/sun4x_510/2.4.2/lib/python2.4/site-packages/numpy/lib/type_check.py",
>  
> 
> line 8, in ?
>  import numpy.core.numeric as _nx
>File
> "/afs/.rcf.mitre.org/lang/python/sun4x_510/2.4.2/lib/python2.4/site-packages/numpy/core/__init__.py",
>  
> 
> line 5, in ?
>  import multiarray
> ImportError: ld.so.1: python: fatal: relocation error: file
> /afs/.rcf.mitre.org/lang/python/sun4x_510/2.4.2/lib/python2.4/site-packages/numpy/core/multiarray.so:
>  
> 
> symbol __builtin_isfinite: referenced symbol not found
>  >>>
> 
> See build.log attached as well.
> 
> 
> 


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Failure with 1.3.0b1 under Solaris 10 SPARC

2009-03-30 Thread Jeff Blaine

> What version of glibc do you have?

None.  Solaris does not use GNU libc.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Failure with 1.3.0b1 under Solaris 10 SPARC

2009-03-30 Thread Jeff Blaine

FWIW, I solved this just now by removing Sun Studio from
my PATH before build.  It's clear that's a workaround
though and the build process failed to determine something
properly.

Jeff Blaine wrote:
>> What version of glibc do you have?
> 
> None.  Solaris does not use GNU libc.
> 
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Question about mrecarray

2008-03-04 Thread Jeff Garrett

Hi,

I'm using an mrecarray in a situation where I need to replace the masked
values with default values which are not necessarily the same as the
fill value...   Something like:

for field, mask in zip(row, row._fieldmask):
value = field if not mask else ...
...

Is there a better way to tell if the individual fields are masked than
accessing ._fieldmask?

Thanks,
Jeff Garrett
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

84 matches

Mail list logo