[Numpy-discussion] Re: NumPy-Discussion Digest, Vol 183, Issue 33

2021-12-29 Thread Stefano Miccoli
Lev, excuse me if I go in super pedantic mode, but your answer and the current 
text of the article fail to grasp an important point.

1) The proleptic Gregorian calendar is about leap year rules. It tracks days 
without making any assumption on the length of days. If we agree on using this 
calendar, dates like -0099-07-12 and 2021-12-29 are defined without ambiguity, 
and we can easily compute the number of days between these two dates.

2) Posix semantics is about the length of a day, and is based on the (utterly 
wrong) assumption that a mean solar day is constant and exactly 86400 SI 
seconds long. (For an authoritative estimate of historical length of day 
variations see > and the related papers 
 
>)

Knowing assumption 1) is important when coding dates before 1582-10-15: e.g. 
1582-10-04 Julian is 1582-10-14 proleptic Gregorian. Once we agree on the 
proleptic Gregorian calendar everything works as expected: time deltas 
expressed in days are correct.

Knowing assumption 2) is important if we pretend to compute time deltas for 
date-time objects with high precision: e.g. how many SI seconds occur between 
1582-10-14T12:00:00 and 1582-10-15T12:00:00 with millisecond precision? Here we 
must first define what T12:00:00 means, say UT1, but most critically we need to 
know the length of day in 1582. With Posix semantics a day is always 86400.000 
SI second long; however  the real value of the length of day in 1582 could be 
about 5 ms less. The problem here is that small errors accumulate and if we 
compute the difference between -01-01T12:00:00 and 1900-01-01T12:00:00 the 
numpy answer may be off by about 10_000 seconds. 

Fast forward to current times: after 1972 T12:00:00 should be defined as UTC, 
and the posix assumption is correct for almost every day, bar when a leap 
second is added (86401 s) or removed (86399 s, but this has never occurred.) 
Now the numpy computed timedeltas are correct up to an integral number of 
seconds that can be derived from a leap second table, if both dates are in the 
past. If one or both of the dates are in the future, then we must rely on 
models of earth rotation, and estimate the future introduction of leap seconds. 
But earth rotation is quite “unpredictable”, so usually this is not very 
accurate.

The main problem with numpy datetime64 is that by using np.int64 for Datetimes 
it gives 1/2**63 precision (about 1e-19). But this apparent very high precision 
has to be confronted with the relative accuracy of the Posix semantics, which 
lies at about 1e-7, 1e-8, if we look at timespans of a couple of centuries. So 
I agree that the np.datetime64 precision is somehow misleading. 

This all said, proleptic Gregorian + Posix semantics is, in my opinion, the 
only sensible option in a numerical package like numpy, although the results 
can be inaccurate. However errors are usually small on the average (say 10 
ms/day which is about 1e-7). Everything more sophisticated is in the realm of 
specialised packages, like AstroPy, but also Skyfield 
>.

Stefano

> On 28 Dec 2021, at 21:35, numpy-discussion-requ...@python.org wrote:
> 
> t is not a matter of formal definitions. Leap seconds are uncompromisingly 
> practical.
> If you look at the wall clock on 1 Jan 1970 00:00 and then look at the same 
> clock today and measure the difference with atomic clock you won't get the 
> time delta that np.timedelta64 reports. There will be a difference of ~37 
> seconds.

Actually this should be 27s.

> One would expect that a library claiming to work with attoseconds would at 
> least count the seconds correctly )
> Astropy library calculates 
> 
>  them properly: 
> "GPS Time. Seconds from 1980-01-06 00:00:00 UTC For example, 630720013.0 is 
> midnight on January 1, 2000."
> >>> np.datetime64('2000-01-01', 's') - np.datetime64('1980-01-06', 's')
> numpy.timedelta64(63072,'s')
> 
> Everything should be made as simple as possible but not simpler. Leap seconds 
> are an inherent part of the world we live in.
> 
> Eg this is how people deal with them currently: they have to parse times like 
> 23:59:60.209215 manually
> https://stackoverflow.com/questions/21027639/python-datetime-not-accounting-for-leap-second-properly
>  
> 
> 
> - calendrical calculations are performed using a proleptic Gregorian calendar 
>  >,
> - Posix semantics is followed, i.e. each day comprises exactly 86400 

[Numpy-discussion] representation of valid float type range

2021-12-29 Thread alejandro . giacometti
I am getting an interesting result, and I'm wondering if anyone would care to 
give me some intuition of why.

The example is simple enough, I want to get a range of values that are 
representable by a type:

```python
f64_info = np.finfo(np.float64)
valid_range = np.linspace(
start=f64_info.min, stop=f64_info.max, num=10
)
valid_range => array([nan, inf, inf,
 inf,
   inf, inf, inf, inf,
   inf, 1.79769313e+308])
```

The minimum value is representable by the type, I can see it:

```python
f64_info.min => -1.7976931348623157e+308
```

I thought that maybe the valid range cannot start with the minimun value, so 
I've tried a few alternatives:

```python

valid_range = np.linspace(
start=f64_info.min + f64_info.eps, stop=f64_info.max, num=10
)
valid_range => array([nan, inf, inf,
 inf,
   inf, inf, inf, inf,
   inf, 1.79769313e+308])


valid_range = np.linspace(
start=f64_info.min + f64_info.tiny, stop=f64_info.max, num=10
)
valid_range => array([nan, inf, inf,
 inf,
   inf, inf, inf, inf,
   inf, 1.79769313e+308])
```

I thought maybe the range is too wide, but I can do this:

```python
valid_range = np.linspace(
start=0, stop=f64_info.max, num=10
)
valid_range => array([0.e+000, 1.99743682e+307, 3.99487363e+307, 
5.99231045e+307,
   7.98974727e+307, 9.98718408e+307, 1.19846209e+308, 
1.39820577e+308,
   1.59794945e+308, 1.79769313e+308])

...

valid_range = np.linspace(
start=f64_info.tiny, stop=f64_info.max, num=10
)
valid_range => array([2.22507386e-308, 1.99743682e+307, 3.99487363e+307, 
5.99231045e+307,
   7.98974727e+307, 9.98718408e+307, 1.19846209e+308, 
1.39820577e+308,
   1.59794945e+308, 1.79769313e+308])

...

f32_info = np.finfo(np.float32)
valid_range = np.linspace(
start=f32_info.tiny, stop=f32_info.max, num=10, dtype=np.float32,
)
valid_range => array([1.1754944e-38, 3.7809150e+37, 7.5618299e+37, 
1.1342745e+38,
   1.5123660e+38, 1.8904575e+38, 2.2685490e+38, 2.6466405e+38,
   3.0247320e+38, 3.4028235e+38], dtype=float32)

```

I know that linear space is arbitrary, and perhaps not that useful. In fact 
this is valid:

```python
valid_range = np.logspace(
start=f64_info.minexp, stop=f64_info.maxexp, num=10, base=2, endpoint=False
)
valid_range => array([2.22507386e-308, 8.67124674e-247, 3.37923704e-185, 
1.31690901e-123,
   5.13207368e-062, 2.e+000, 7.79412037e+061, 3.03741562e+123,
   1.18369915e+185, 4.61294681e+246])
```

But I'm still confused on why linear space is invalid

Thanks!
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: representation of valid float type range

2021-12-29 Thread Sebastian Gurovich
Could it be you need to get a handle on the "epsilon machine"?

On Wed, 29 Dec 2021, 9:21 am ,  wrote:

> I am getting an interesting result, and I'm wondering if anyone would care
> to give me some intuition of why.
>
> The example is simple enough, I want to get a range of values that are
> representable by a type:
>
> ```python
> f64_info = np.finfo(np.float64)
> valid_range = np.linspace(
> start=f64_info.min, stop=f64_info.max, num=10
> )
> valid_range => array([nan, inf, inf,
>inf,
>inf, inf, inf, inf,
>inf, 1.79769313e+308])
> ```
>
> The minimum value is representable by the type, I can see it:
>
> ```python
> f64_info.min => -1.7976931348623157e+308
> ```
>
> I thought that maybe the valid range cannot start with the minimun value,
> so I've tried a few alternatives:
>
> ```python
>
> valid_range = np.linspace(
> start=f64_info.min + f64_info.eps, stop=f64_info.max, num=10
> )
> valid_range => array([nan, inf, inf,
>inf,
>inf, inf, inf, inf,
>inf, 1.79769313e+308])
>
>
> valid_range = np.linspace(
> start=f64_info.min + f64_info.tiny, stop=f64_info.max, num=10
> )
> valid_range => array([nan, inf, inf,
>inf,
>inf, inf, inf, inf,
>inf, 1.79769313e+308])
> ```
>
> I thought maybe the range is too wide, but I can do this:
>
> ```python
> valid_range = np.linspace(
> start=0, stop=f64_info.max, num=10
> )
> valid_range => array([0.e+000, 1.99743682e+307, 3.99487363e+307,
> 5.99231045e+307,
>7.98974727e+307, 9.98718408e+307, 1.19846209e+308,
> 1.39820577e+308,
>1.59794945e+308, 1.79769313e+308])
>
> ...
>
> valid_range = np.linspace(
> start=f64_info.tiny, stop=f64_info.max, num=10
> )
> valid_range => array([2.22507386e-308, 1.99743682e+307, 3.99487363e+307,
> 5.99231045e+307,
>7.98974727e+307, 9.98718408e+307, 1.19846209e+308,
> 1.39820577e+308,
>1.59794945e+308, 1.79769313e+308])
>
> ...
>
> f32_info = np.finfo(np.float32)
> valid_range = np.linspace(
> start=f32_info.tiny, stop=f32_info.max, num=10, dtype=np.float32,
> )
> valid_range => array([1.1754944e-38, 3.7809150e+37, 7.5618299e+37,
> 1.1342745e+38,
>1.5123660e+38, 1.8904575e+38, 2.2685490e+38,
> 2.6466405e+38,
>3.0247320e+38, 3.4028235e+38], dtype=float32)
>
> ```
>
> I know that linear space is arbitrary, and perhaps not that useful. In
> fact this is valid:
>
> ```python
> valid_range = np.logspace(
> start=f64_info.minexp, stop=f64_info.maxexp, num=10, base=2,
> endpoint=False
> )
> valid_range => array([2.22507386e-308, 8.67124674e-247, 3.37923704e-185,
> 1.31690901e-123,
>5.13207368e-062, 2.e+000, 7.79412037e+061,
> 3.03741562e+123,
>1.18369915e+185, 4.61294681e+246])
> ```
>
> But I'm still confused on why linear space is invalid
>
> Thanks!
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: seb...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: representation of valid float type range

2021-12-29 Thread Lev Maximov
• Short answer:

It's because
>>> f64_info.max - f64_info.min
inf

• Long answer:

linspace(a,b,n) tries to calculate the step by (b-a)/n and fails at (b-a).

You need to either
– split your range into two parts and then glue them back:
np.r_[np.linspace(f64_info.min, 0, 5), np.linspace(0, f64_info.max, 5)[1:]]

– or select a range that fits into float64:
np.linspace(f64_info.min/2, f64_info.max/2, 10)

– or select np.float128 as a dtype for linspace (linux/macos only):
np.linspace(np.float128(f64_info.min), np.float128(f64_info.max), 10)

Best regards,
Lev


On Wed, Dec 29, 2021 at 8:01 PM Sebastian Gurovich  wrote:

> Could it be you need to get a handle on the "epsilon machine"?
>
> On Wed, 29 Dec 2021, 9:21 am ,  wrote:
>
>> I am getting an interesting result, and I'm wondering if anyone would
>> care to give me some intuition of why.
>>
>> The example is simple enough, I want to get a range of values that are
>> representable by a type:
>>
>> ```python
>> f64_info = np.finfo(np.float64)
>> valid_range = np.linspace(
>> start=f64_info.min, stop=f64_info.max, num=10
>> )
>> valid_range => array([nan, inf, inf,
>>inf,
>>inf, inf, inf, inf,
>>inf, 1.79769313e+308])
>> ```
>>
>> The minimum value is representable by the type, I can see it:
>>
>> ```python
>> f64_info.min => -1.7976931348623157e+308
>> ```
>>
>> I thought that maybe the valid range cannot start with the minimun value,
>> so I've tried a few alternatives:
>>
>> ```python
>>
>> valid_range = np.linspace(
>> start=f64_info.min + f64_info.eps, stop=f64_info.max, num=10
>> )
>> valid_range => array([nan, inf, inf,
>>inf,
>>inf, inf, inf, inf,
>>inf, 1.79769313e+308])
>>
>>
>> valid_range = np.linspace(
>> start=f64_info.min + f64_info.tiny, stop=f64_info.max, num=10
>> )
>> valid_range => array([nan, inf, inf,
>>inf,
>>inf, inf, inf, inf,
>>inf, 1.79769313e+308])
>> ```
>>
>> I thought maybe the range is too wide, but I can do this:
>>
>> ```python
>> valid_range = np.linspace(
>> start=0, stop=f64_info.max, num=10
>> )
>> valid_range => array([0.e+000, 1.99743682e+307, 3.99487363e+307,
>> 5.99231045e+307,
>>7.98974727e+307, 9.98718408e+307, 1.19846209e+308,
>> 1.39820577e+308,
>>1.59794945e+308, 1.79769313e+308])
>>
>> ...
>>
>> valid_range = np.linspace(
>> start=f64_info.tiny, stop=f64_info.max, num=10
>> )
>> valid_range => array([2.22507386e-308, 1.99743682e+307, 3.99487363e+307,
>> 5.99231045e+307,
>>7.98974727e+307, 9.98718408e+307, 1.19846209e+308,
>> 1.39820577e+308,
>>1.59794945e+308, 1.79769313e+308])
>>
>> ...
>>
>> f32_info = np.finfo(np.float32)
>> valid_range = np.linspace(
>> start=f32_info.tiny, stop=f32_info.max, num=10, dtype=np.float32,
>> )
>> valid_range => array([1.1754944e-38, 3.7809150e+37, 7.5618299e+37,
>> 1.1342745e+38,
>>1.5123660e+38, 1.8904575e+38, 2.2685490e+38,
>> 2.6466405e+38,
>>3.0247320e+38, 3.4028235e+38], dtype=float32)
>>
>> ```
>>
>> I know that linear space is arbitrary, and perhaps not that useful. In
>> fact this is valid:
>>
>> ```python
>> valid_range = np.logspace(
>> start=f64_info.minexp, stop=f64_info.maxexp, num=10, base=2,
>> endpoint=False
>> )
>> valid_range => array([2.22507386e-308, 8.67124674e-247, 3.37923704e-185,
>> 1.31690901e-123,
>>5.13207368e-062, 2.e+000, 7.79412037e+061,
>> 3.03741562e+123,
>>1.18369915e+185, 4.61294681e+246])
>> ```
>>
>> But I'm still confused on why linear space is invalid
>>
>> Thanks!
>> ___
>> NumPy-Discussion mailing list -- numpy-discussion@python.org
>> To unsubscribe send an email to numpy-discussion-le...@python.org
>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>> Member address: seb...@gmail.com
>>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: lev.maxi...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com