Jeff Reback <jeffreback <at> gmail.com> writes: > > Dave, > > your example is not a problem with numpy per se, rather that the default generation is in local timezone (same as what python datetime does). > If you localize to UTC you get the results that you expect. >
The problem is that the default datetime generation in *numpy* is in local time. Note that this *is not* the case in Python - it doesn't try to guess the timezone info based on where in the world you run the code, if it's not provided it sets it to None. In [7]: pd.datetime? Type: type String Form:<type 'datetime.datetime'> Docstring: datetime(year, month, day[, hour[, minute[, second[, microsecond[,tzinfo]]]]]) The year, month and day arguments are required. tzinfo may be None, or an instance of a tzinfo subclass. The remaining arguments may be ints or longs. In [8]: pd.datetime(2000,1,1).tzinfo is None Out[8]: True This may be the best solution but as others have pointed out this is more difficult to implement and may have other issues. I don't want to wait for the best solution - the assume UTC on input/output if not specified will solve the problem and this desperately needs to be fixed because it's completely broken as is IMHO. > If you localize to UTC you get the results that you expect. That's the whole point - *numpy* needs to localize to UTC, not to whatever timezone you happen to be in when running the code. In a real-world data analysis problem you don't start with the data in a DataFrame or a numpy array it comes from the web, a csv, Excel, a database and you want to convert it to a DataFrame or numpy array. So what you have from whatever source is a list of tuples of strings and you want to convert them into a typed array. Obviously you can't localize a string - you have to convert it to a date first and if you do that with numpy the date you have is wrong. In [108]: dst = np.array(['2014-03-30 00:00', '2014-03-30 01:00', '2014-03- 30 02:00'], dtype='M8[h]') ...: dst ...: Out[108]: array(['2014-03-30T00+0000', '2014-03-30T00+0000', '2014-03- 30T02+0100'], dtype='datetime64[h]') In [109]: dst.tolist() Out[109]: [datetime.datetime(2014, 3, 30, 0, 0), datetime.datetime(2014, 3, 30, 0, 0), datetime.datetime(2014, 3, 30, 1, 0)] AFAICS there's no way to get the original dates back once they've passed through numpy's parser!? -Dave _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion