On Mar 19, 2014, at 10:01 AM, Dave Hirschfeld <novi...@gmail.com> wrote:
> Jeff Reback <jeffreback <at> gmail.com> writes: > >> >> Dave, >> >> your example is not a problem with numpy per se, rather that the default > generation is in local timezone (same as what python datetime does). >> If you localize to UTC you get the results that you expect. >> > > The problem is that the default datetime generation in *numpy* is in local > time. > > Note that this *is not* the case in Python - it doesn't try to guess the > timezone info based on where in the world you run the code, if it's not > provided it sets it to None. > > In [7]: pd.datetime? > Type: type > String Form:<type 'datetime.datetime'> > Docstring: > datetime(year, month, day[, hour[, minute[, second[, > microsecond[,tzinfo]]]]]) > > The year, month and day arguments are required. tzinfo may be None, or an > instance of a tzinfo subclass. The remaining arguments may be ints or longs. > > In [8]: pd.datetime(2000,1,1).tzinfo is None > Out[8]: True > > > This may be the best solution but as others have pointed out this is more > difficult to implement and may have other issues. > > I don't want to wait for the best solution - the assume UTC on input/output > if not specified will solve the problem and this desperately needs to be > fixed because it's completely broken as is IMHO. > > >> If you localize to UTC you get the results that you expect. > > That's the whole point - *numpy* needs to localize to UTC, not to whatever > timezone you happen to be in when running the code. > > In a real-world data analysis problem you don't start with the data in a > DataFrame or a numpy array it comes from the web, a csv, Excel, a database > and you want to convert it to a DataFrame or numpy array. So what you have > from whatever source is a list of tuples of strings and you want to convert > them into a typed array. > > Obviously you can't localize a string - you have to convert it to a date > first and if you do that with numpy the date you have is wrong. > > In [108]: dst = np.array(['2014-03-30 00:00', '2014-03-30 01:00', '2014-03- > 30 02:00'], dtype='M8[h]') > ...: dst > ...: > Out[108]: array(['2014-03-30T00+0000', '2014-03-30T00+0000', '2014-03- > 30T02+0100'], dtype='datetime64[h]') > > In [109]: dst.tolist() > Out[109]: > [datetime.datetime(2014, 3, 30, 0, 0), > datetime.datetime(2014, 3, 30, 0, 0), > datetime.datetime(2014, 3, 30, 1, 0)] > > > AFAICS there's no way to get the original dates back once they've passed > through numpy's parser!? > > > -Dave > > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion Hi all, I've written a rather rudimentary NEP, (lacking in technical details which I will hopefully add after some further discussion and receiving clarification/help on this thread). Please let me know how to proceed and what you think should be added to the current proposal (attached to this mail). Here is a rendered version of the same: https://github.com/Sankarshan-Mudkavi/numpy/blob/Enhance-datetime64/doc/neps/datetime-improvement-proposal.rst Cheers, Sankarshan -- Sankarshan Mudkavi Undergraduate in Physics, University of Waterloo www.smudkavi.com
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion