[Numpy-discussion] Documentation Team meeting - Monday November 9

2020-11-07 Thread Melissa Mendonça
Hi all!

This is a reminder that our next Documentation Team meeting will be on *Monday,
November 9* at 3PM UTC** (PLEASE MIND THE RECENT TIME CHANGES AND SEE IF IT
APPLIES TO YOUR AREA)

If you wish to join on Zoom, **you need to use this NEW link**

https://zoom.us/j/96219574921?pwd=VTRNeGwwOUlrYVNYSENpVVBRRjlkZz09


Here's the permanent hackmd document with the meeting notes (still being
updated in the next few days!):

https://hackmd.io/oB_boakvRqKR-_2jRV-Qjg


Hope to see you around!

** You can click this link to get the correct time at your timezone:
https://www.timeanddate.com/worldclock/fixedtime.html?msg=NumPy+Documentation+Team+Meeting&iso=20201109T15&p1=1440&ah=1

*** You can add the NumPy community calendar to your google calendar by
clicking this link:
https://calendar.google.com/calendar/r?cid=YmVya2VsZXkuZWR1X2lla2dwaWdtMjMyamJobGRzZmIyYzJqODFjQGdyb3VwLmNhbGVuZGFyLmdvb2dsZS5jb20

- Melissa
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] datetime64: Remove deprecation warning when constructing with timezone

2020-11-07 Thread Noam Yorav-Raphael
On Fri, Nov 6, 2020 at 5:58 PM Brock Mendel  wrote:

> > I find the whole notion of a "timezone naive timestamp" to be nearly
> meaningless
>
> From the perspective of, say, the dateutil parser, what would you do with
> "2020-11-06 07:48"?  If you assume it's UTC you'll be wrong in this case.
> If you assume it is in your local timezone, you'll be wrong in Europe.
> Timezone-naive datetimes are an abstraction for exactly this case.
>
> I'm not sure what you mean by "the perspective of the dateutil parser".
Indeed, "2020-11-06 07:48" is not a well-defined timestamp, since it
doesn't define a specific moment in time. If you ask what a timestamp type
should do when constructed from such a string, then I can think of two
reasonable alternatives. One is to just not allow it, and perhaps provide a
.from_local() method which makes it explicit. The other is to allow it, and
make it clear that when an offset is not defined, it uses the environment's
timezone to convert the string to a timestamp. I wouldn't use the third
alternative, which is to parse it in UTC, since it doesn't add a lot of
convenience since it's easy to add a "Z" to the string.


> >>> t0 = pd.Timestamp.now()
>
> You can use `pd.Timestamp.now("UTC")`.  See also
> https://mail.python.org/archives/list/datetime-...@python.org/thread/PT4JWJLYBE5R2QASVBPZLHH37ULJQR43/
> , https://github.com/pandas-dev/pandas/issues/22451
>
> Thanks for pointing this out. However, this doesn't work:

>>> pd.Timestamp.fromtimestamp(time.time(), 'UTC')
Traceback (most recent call last):
...
TypeError: fromtimestamp() takes exactly 2 positional arguments (3 given)

Also, this doesn't work:

>>> t0 = pd.Timestamp.now('UTC')
... t1 = pd.Timestamp.now('Asia/Jerusalem')
... t1 - t0
Traceback (most recent call last):
...
TypeError: Timestamp subtraction must have the same timezones or no
timezones

Also, this doesn't do what it probably should:

>>> pd.Timestamp.now('UTC'), pd.Timestamp.now().tz_localize('UTC')
(Timestamp('2020-11-07 20:18:38.719603+', tz='UTC'),
 Timestamp('2020-11-08 01:18:38.719701+', tz='UTC'))

(I have no idea how the second result was calculated, but it's wrong. It
should have been equal to the first)

So, pd.Timestamp is crap. I think that adding np.timestamp64 may finally
bring a sane timestamp type to python.

Thanks,
Noam


>
>
>
> On Fri, Nov 6, 2020 at 2:48 AM Noam Yorav-Raphael 
> wrote:
>
>> Hi,
>>
>> I actually arrived at this by first trying to use pandas.Timestamp and
>> getting very frustrated about it. With pandas, I get:
>>
>> >>> pd.Timestamp.now()
>> Timestamp('2020-11-06 09:45:24.249851')
>>
>> I find the whole notion of a "timezone naive timestamp" to be nearly
>> meaningless. A timestamp should mean a moment in time (as the current numpy
>> documentation defines very well). A "naive timestamp" doesn't mean
>> anything. It's exactly like a "unit naive length". I can have a Length type
>> which just takes a number, and be very happy that it works both if my "unit
>> zone" is inches or centimeters. So "Length(3)" will mean 3 cm in most of
>> the world and 3 inches in the US. But then, if I get "Length(3)" from
>> someone, I can't be sure what length it refers to.
>>
>> So currently, this happens with pandas timestamps:
>>
>> >>> os.environ['TZ'] = 'UTC'; time.tzset()
>> ... t0 = pd.Timestamp.now()
>> ... time.sleep(1)
>> ... os.environ['TZ'] = 'EST-5'; time.tzset()
>> ... t1 = pd.Timestamp.now()
>> ... t1 - t0
>> Timedelta('0 days 05:00:01.001583')
>>
>> This is not just theoretical - I actually need to work with data from
>> several devices, each in its own time zone. And I need to know that I won't
>> get such meaningless results.
>>
>> And you can even get something like this:
>>
>> >>> t0 = pd.Timestamp.now()
>> ... time.sleep(10)
>> ... t1 = pd.Timestamp.now()
>> ... t1 - t0
>> Timedelta('0 days 01:00:10.001583')
>>
>> if the first measurement happened to be in winter time and the second
>> measurement happened to be in daylight saving time.
>>
>> The solution is simple, and is what datetime64 used to do before the
>> change - have a type that just represents a moment in time. It's not "in
>> UTC" - it just stores the number of seconds that passed since an agreed
>> moment in time (which is usually 1970-01-01 02:00+0200, which is more
>> commonly referred to as 1970-01-01 00:00Z - it's the exact same moment).
>>
>> I think it would make things clearer if I'll mention that there are
>> operations that are not dealing with timestamps. For example, it's
>> meaningless to ask what is the year of a timestamp - it may depend on the
>> time zone. These are always *human* related questions, that depend on
>> certain human conventions. We can call them "calendar questions". For these
>> types of questions, a type that includes both a timestamp and a timezone
>> offset (in minutes from UTC) can be useful. Some questions even require
>> full timezone information, meaning a function that defines what's the
>> timezone offset for each mom

[Numpy-discussion] Proposal: add the timestamp64 type

2020-11-07 Thread Noam Yorav-Raphael
Hi,

(I'm repeating things I wrote under the "datetime64: Remove deprecation
warning..." thread, since I'm now proposing a new solution.)

I propose to add a new type called "timestamp64". It will be a pure
timestamp, meaning that it represents a moment in time (as seconds/ms/us/ns
since the epoch), without any timezone information. It will have the exact
same behavior as datetime64 had before version 1.11, except that its only
allowed units will be seconds, milliseconds, microseconds and nanoseconds.
Removing the longer units will make it clear that it doesn't deal with
calendar and dates. Also, all the business day functionality will not be
applicable to timestamp64. In order to get calendar information (such as
the year) from timestamp64, you will have to manually convert it to
python's datetime (or perhaps to np.datetime64) with an explicit timezone
(utc, local, an offset, or a timezone object).

This is needed because since the change introduced in 1.11, datetime64 no
longer represents a timestamp, but rather a date and time of an abstract
calendar. So given a datetime64, it is not possible to get an actual
timestamp without knowing the timezone to which the datetime64 refers. If
the datetime64 is in a timezone with daylight saving time, it can even be
ambiguous, since the same written hour will occur twice on the transition
from DST to winter time.

I would like it to work like this:

>>> np.timestamp64.now()
numpy.timestamp64('2020-11-07 22:42:52.871159+0200')

>>> np.timestamp64.now('s')
numpy.timestamp64('2020-11-07 22:42:52+0200')

>>> np.timestamp64(1604781916, 's')
numpy.timestamp64('2020-11-07 22:42:52+0200')

>>> np.timestamp64('2020-11-07 20:42:52Z')
numpy.timestamp64('2020-11-07 22:42:52+0200')

* timestamp64.now() will get an optional string argument with the base
units. If not given, I think 'us' is a good default.
* The repr will format the timestamp using the environment's timezone.
* I like the repr to not include a 'T' between the date and the time. I
find it much easier to read.
* I tend to think that it should be allowed to construct a timestamp64 from
an ISO8601 string without a timezone offset, in which case the
environment's timezone will be used to convert it to a timestamp. So in the
Asia/Jerusalem timezone it will look like:

>>> np.timestamp64('2020-11-07 22:42:52')
numpy.timestamp64('2020-11-07 22:42:52+0200')

>>> np.timestamp64('2020-08-01 22:00:00')
numpy.timestamp64('2020-08-01 22:00:00+0300')


If I implement this, could it be added to numpy?


Thanks,
Noam
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion