from:"Stefano Miccoli"

Re: [Numpy-discussion] round / set_printoptions discrepancy

2019-09-13 Thread Stefano Miccoli

In my opinion the problem is that numpy floats break the Liskov substitution 
principle,

>>> pyfloat = 16.055
>>> npfloat = np.float64(pyfloat)
>>> isinstance(npfloat, float)
True
>>> round(pyfloat, 2)
16.05
>>> round(npfloat, 2)
16.06

Since numpy.float64 is a subclass of builtins.float I would expect that

>>> round(x, j) == round(np.float64(x), j)

is an invariant, but unfortunately this is not the case. 

Moreover the python3 semantics of the round function require that when the 
number of digits is None,
the return value should be of integral type:

>>> round(pyfloat)
16
>>> round(pyfloat, None)
16
>>> round(pyfloat, 0)
16.0
>>> round(npfloat)
16.0
>>> round(npfloat, None)
16.0
>>> round(npfloat, 0)
16.0

see also https://github.com/numpy/numpy/issues/11810

Stefano
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] New DTypes: Are scalars a central concept in NumPy or not?

2020-02-25 Thread Stefano Miccoli

The fact that `isinstance(np.float64(1), float)` raises the problem that the 
current
implementation of np.float64 scalars breaks the Liskov substitution principle:
`sequence_or_array[round(x)]` works if `x` is a float, but breaks down if x is 
a np.float64.

See https://github.com/numpy/numpy/issues/11810, where the issue is discussed 
in the
broader setting of the semantics of `np.round` vs. python3 `round`.

I do not have a strong opinion here, except that if np.float64’s are within the 
python
number hierarchy they should be PEP 3141 compliant (which currently they are 
not.)

Stefano

> On 25 Feb 2020, at 00:03, numpy-discussion-requ...@python.org wrote:
> 
> Also, something to think about is that currently numpy scalars satisfy
> the property `isinstance(np.float64(1), float)`, i.e they are within the
> python numerical type hierarchy. 0d arrays do not have this property. My
> proposal above would break this. I'm not sure what to think about
> whether this is a good property to maintain or not.
> 
> Cheers,
> Allan

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Output type of round is inconsistent with python built-in

2020-02-27 Thread Stefano Miccoli

There several mixed issues here.

1. PEP 3141  compliance.

Numpy scalars are `numbers.Real` instances, and have to respect the 
`__round__` semantics defined by PEP 3141:

def __round__(self, ndigits:Integral=None):
"""Rounds self to ndigits decimal places, defaulting to 0.

If ndigits is omitted or None, returns an Integral,
otherwise returns a Real, preferably of the same type as
self. Types may choose which direction to round half. For
example, float rounds half toward even.

"""

This means that if Real -> Real rounding is desired one should call 
`round(x, 0)` or `np.around(x)`. 

This semantics only dictates that the return type should be Integral, so
for `round(x)` and `round(x, None)`

np.float32 -> np.int32
np.float32 -> np.int64
np.float64 -> np.int64
np.floatXX -> int

are all OK.
I think also that it is perfectly OK to raise an overflow on `round(x)`


2. Liskov substitution principle

`np.float64` floats are also `float` instances (but `np.float32` are not.)
This means that strictly respecting LSP means that `np.float64` should round to 
python
`int`, since `round(x)` never overflows for python `float`.

Here we have several options.

- round `np.float64` -> `int` and respect LSP.

- relax LSP, and round  `np.float64` -> `np.int64`. Who cares about 
`round(1e300)`?

- decide that there is no reason for having `np.float64` a subclass of `float`,
  so that LSP does not apply.


This all said, I think that these are the two most sensible choices for 
`round(x)`:

np.float32 -> np.int32
np.float64 -> np.int64
drop np.float64 subclassing python float

or

np.float32 -> int
np.float64 -> int
keep np.float64 subclassing python float


The second one seems to me the less disruptive one.

Bye

Stefano
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposal: add the timestamp64 type (Noam Yorav-Raphael)

2020-11-12 Thread Stefano Miccoli



On 11 Nov 2020, at 18:00, 
numpy-discussion-requ...@python.org 
wrote:

I propose to add a new type called "timestamp64". It will be a pure timestamp, 
meaning that it represents a moment in time (as seconds/ms/us/ns since the 
epoch), without any timezone information.

Sorry, but I really don see the usefulness for another time stamping format 
based on POSIX time. Indeed POSIX time is based on a naive approximation of UTC 
and is ambiguous across leap seconds. Quoting from Wikipedia 


The Unix time number 1483142400 is thus ambiguous: it can refer either to start 
of the leap second (2016-12-31 23:59:60) or the end of it, one second later 
(2017-01-01 00:00:00). In the theoretical case when a negative leap second 
occurs, no ambiguity is caused, but instead there is a range of Unix time 
numbers that do not refer to any point in UTC time at all.

Precision time stamping is quite a complex task: you can use UTC, TAI, GPS, 
just to mention the most used timescales. And how do you deal with timestamps 
in the past, when timekeeping was based on earth rotation, and not atomic 
clocks ticking at (approximately) 1 SI-second frequency?

In my opinion time-stamping should be application dependent, and I doubt that 
the new “timestamp64” could be beneficial to the numpy community.

Best regards,

Stefano
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposal: add the timestamp64 type

2020-11-13 Thread Stefano Miccoli

Discussion on time is endless! (Sorry for the extra noise, on the mailing list, 
but I would clarify some points.)

If I got it right, np.datetime64 is defined by these points.

1) Internal representation: 64bit signed integer *plus* a time unit. The time 
unit can be expressed as
- a SI valid unit (SI second and all decimal subunits up to the attosecond)
- a SI acceptable unit (minute, hour, day)
- a date unit (week, month, year)

2) Conversion routines: a bijective map from the internal representation to a 
proleptic Gregorian calendar [0] assuming a fixed epoch of 1970-01-01T00:00Z. 
The mapping neglects leap seconds and is not time-zone aware.

I think that the current choice of 2) is a sensible one: I agree with Dan that 
it is useful to a wide audience, easy to compute, not ambiguous.

I would discourage any attempt to implement in numpy more complex mappings, 
which are aware of time-zones and leap seconds, and why not, of the wide array 
of other time scales and time representation in use: this is a very complex 
task, and a nightmare from the point of view of maintenance. Other specialised 
libraries exist, like astropy.time [1] or dateutil [2] to this purpose.

However the docs of numpy.datetime64 should be updated, to explicitly mention 
the use of the proleptic Gregorian calendar, and better clarify how the date 
units (month, year) are handled when casted to other shorter units like 
seconds, etc.

Stefano

[0] https://en.wikipedia.org/wiki/Proleptic_Gregorian_calendar
[1] https://docs.astropy.org/en/stable/time/
[2] https://dateutil.readthedocs.io/en/stable/

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] bad CRC errors when using np.savez, only sometimes though!

2021-05-14 Thread Stefano Miccoli

If changing the on-disk format is an option, I would suggest h5py 
 which allows to save numpy arrays in HDF5 
format.

Stefano

On 14 May 2021, at 16:22, 
numpy-discussion-requ...@python.org 
wrote:

Aside from writing my own numpy storage file container, I am stumped as to how 
to fix this, or reproduce this in a consistent manner.  Any suggestions would 
be greatly appreciated!

Thank you,
Isaac


___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Re: An article on numpy data types

2021-12-28 Thread Stefano Miccoli

Nice overview!

In my opinion section 5. Datetimes could benefit from some clarifications, and 
contains a couple of typos.

Let me start with the typos. 

I read 
"As in pure python, np.datetime64 is accompained by np.timedelta64 (stored as a 
single np.uint64) with the expectable arithmetic operators.”
but it should be “np.int64” since time delta values are signed.

"Leap seconds are supported:

Leap seconds are not (but proposed)
"
Of course the first sentence should be “leap years”, which leads to my main 
point. 

It makes no sense to claim “leap year support” without specifying the relevant 
calendar. Thus I would suggest to clearly state, from the very begin of this 
section, that 

- calendrical calculations are performed using a proleptic Gregorian calendar 
>,
- Posix semantics is followed, i.e. each day comprises exactly 86400 SI 
seconds, thus ignoring the existence of leap seconds.

I would also point out that this choice is consistent with python datetime.

As what regards the promised future support for leap seconds, I would not 
mention it, for now. In fact leap second support requires a leap second table, 
which is not available on all platforms supported by numpy. Therefore the leap 
second table should be bundled and updated with every numpy release with the 
very undesirable effect that older version (with outdated tables) would behave 
differently from newer ones.

Stefano



smime.p7s
Description: S/MIME cryptographic signature
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: NumPy-Discussion Digest, Vol 183, Issue 33

2021-12-29 Thread Stefano Miccoli

Lev, excuse me if I go in super pedantic mode, but your answer and the current 
text of the article fail to grasp an important point.

1) The proleptic Gregorian calendar is about leap year rules. It tracks days 
without making any assumption on the length of days. If we agree on using this 
calendar, dates like -0099-07-12 and 2021-12-29 are defined without ambiguity, 
and we can easily compute the number of days between these two dates.

2) Posix semantics is about the length of a day, and is based on the (utterly 
wrong) assumption that a mean solar day is constant and exactly 86400 SI 
seconds long. (For an authoritative estimate of historical length of day 
variations see > and the related papers 

>)

Knowing assumption 1) is important when coding dates before 1582-10-15: e.g. 
1582-10-04 Julian is 1582-10-14 proleptic Gregorian. Once we agree on the 
proleptic Gregorian calendar everything works as expected: time deltas 
expressed in days are correct.

Knowing assumption 2) is important if we pretend to compute time deltas for 
date-time objects with high precision: e.g. how many SI seconds occur between 
1582-10-14T12:00:00 and 1582-10-15T12:00:00 with millisecond precision? Here we 
must first define what T12:00:00 means, say UT1, but most critically we need to 
know the length of day in 1582. With Posix semantics a day is always 86400.000 
SI second long; however  the real value of the length of day in 1582 could be 
about 5 ms less. The problem here is that small errors accumulate and if we 
compute the difference between -01-01T12:00:00 and 1900-01-01T12:00:00 the 
numpy answer may be off by about 10_000 seconds. 

Fast forward to current times: after 1972 T12:00:00 should be defined as UTC, 
and the posix assumption is correct for almost every day, bar when a leap 
second is added (86401 s) or removed (86399 s, but this has never occurred.) 
Now the numpy computed timedeltas are correct up to an integral number of 
seconds that can be derived from a leap second table, if both dates are in the 
past. If one or both of the dates are in the future, then we must rely on 
models of earth rotation, and estimate the future introduction of leap seconds. 
But earth rotation is quite “unpredictable”, so usually this is not very 
accurate.

The main problem with numpy datetime64 is that by using np.int64 for Datetimes 
it gives 1/2**63 precision (about 1e-19). But this apparent very high precision 
has to be confronted with the relative accuracy of the Posix semantics, which 
lies at about 1e-7, 1e-8, if we look at timespans of a couple of centuries. So 
I agree that the np.datetime64 precision is somehow misleading. 

This all said, proleptic Gregorian + Posix semantics is, in my opinion, the 
only sensible option in a numerical package like numpy, although the results 
can be inaccurate. However errors are usually small on the average (say 10 
ms/day which is about 1e-7). Everything more sophisticated is in the realm of 
specialised packages, like AstroPy, but also Skyfield 
>.

Stefano

> On 28 Dec 2021, at 21:35, numpy-discussion-requ...@python.org wrote:
> 
> t is not a matter of formal definitions. Leap seconds are uncompromisingly 
> practical.
> If you look at the wall clock on 1 Jan 1970 00:00 and then look at the same 
> clock today and measure the difference with atomic clock you won't get the 
> time delta that np.timedelta64 reports. There will be a difference of ~37 
> seconds.

Actually this should be 27s.

> One would expect that a library claiming to work with attoseconds would at 
> least count the seconds correctly )
> Astropy library calculates 
> 
>  them properly: 
> "GPS Time. Seconds from 1980-01-06 00:00:00 UTC For example, 630720013.0 is 
> midnight on January 1, 2000."
> >>> np.datetime64('2000-01-01', 's') - np.datetime64('1980-01-06', 's')
> numpy.timedelta64(63072,'s')
> 
> Everything should be made as simple as possible but not simpler. Leap seconds 
> are an inherent part of the world we live in.
> 
> Eg this is how people deal with them currently: they have to parse times like 
> 23:59:60.209215 manually
> https://stackoverflow.com/questions/21027639/python-datetime-not-accounting-for-leap-second-properly
>  
> 
> 
> - calendrical calculations are performed using a proleptic Gregorian calendar 
>  >,
> - Posix semantics is followed, i.e. each day comprises exactly 86400

[Numpy-discussion] Re: Proposal for new function to determine if a float contains an integer

2022-01-01 Thread Stefano Miccoli

I would rather suggest .is_integer(integer_dtype) signature because knowing 
that 1e300 is an integer is not very useful in the numpy world, since this 
integer number is not representable as a numpy.integer dtype.

Note that in python 

assert not f.is_integer() or int(f) == f

never fails because integers have unlimited precision but this does would not 
map into

assert ( ~f_arr.is_integer() | (np.int64(f_arr) == f.arr) ).all()

because of possible OverflowErrors.

Stefano

> On 31 Dec 2021, at 04:46, numpy-discussion-requ...@python.org wrote:
> 
> Is adding arbitrary optional parameters a thing with ufuncs? I could easily 
> add upper and lower bounds checks.
> 
> On Thu, Dec 30, 2021, 20:56 Brock Mendel  > wrote:
> At least some of the commenters on that StackOverflow page need a slightly 
> stronger check: not only is_integer(x), but also "np.iinfo(dtype).min <= x <= 
> np.info (dtype).max" for some particular dtype.  i.e. "Can I 
> losslessly set these values into the array I already have?"
> 
> 
> 



smime.p7s
Description: S/MIME cryptographic signature
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: An article on numpy data types (Lev Maximov)

2022-01-01 Thread Stefano Miccoli

First of all, happy new 2022 UTC year!

Let my add just a very brief note to this discussion: I opened 
https://github.com/numpy/numpy/issues/20675 
 which addresses the shortcomings 
of the current documentation, which is in my opinion not sufficiently explicit 
in stating the datetime64 semantics. It is true that these are largely 
consistent with python ‘datetime.datetime’, but ‘explicit is better than 
implicit’. If nobody objects I will then open a doc-only PR adding a very short 
paragraph to the docs trying to explain the points discussed here.

As what regards how much time UTC gained from 1970-01-01 up to today, you are 
right, it’s about 29 s. The UTC timescale was officially born in 1963 but it 
can be traced back at least up to 1956/1958, see 
https://github.com/skyfielders/python-skyfield/issues/679 
 where this is 
discussed with reference to the timescales implemented in python-skyfield.

Stefano

> On 31 Dec 2021, at 13:27, numpy-discussion-requ...@python.org wrote:
> 
> Hey, Stefano!
> 
> The level of being pedantic is absolutely acceptable.
> 
> I don't question any of your arguments. They are all perfectly valid. 
> 
> Except that I'd rather say it is ~29 seconds if measuring against 1970. Leap 
> seconds were introduced in 1972 and there were
> a total of 27 seconds since then, but TAI time was ticking since 1958 and 
> gained 10 seconds by 1970 so it is approximately 0.83 second per year at 
> which gives approx 28.67 sec between today and 1970.
> So 1970 is a bad choice of epoch if you want to introduce a leap-second-aware 
> datetime.
> In GPS time they chose 1980. In TAI it is 1958, but that is somewhat worse 
> than 1980 because it is not immediately clear how to perform the conversion 
> timestamp<->timedelta between 1958 and 1970.
> 
> Something like 'proleptic gps time' would be needed to estimate the number of 
> leap seconds in the years before 1972 when they were introduced. Or maybe to 
> limit the leap-second timescale
> to start at 1972 and not to accept any timestamps before that date.
> 
> The system that ignores the existence of the leap seconds has a right to 
> exist.
> But it just has limited applicability.
> 
> np.datetime64 keeps time as a delta between the moment in time and a 
> predefined epoch.
> Which standard does it use to translate this delta to human-readable time in 
> years,
> months, and so on? 
> 
> If it is UTC, then it must handle times like 2016-12-31 23:59:60, because it 
> is a valid UTC timestamp.
> >>> np.datetime64('2016-12-31 12:59:60')
> Traceback (most recent call last):
>   File "", line 1, in 
> ValueError: Seconds out of range in datetime string "2016-12-31 12:59:60"
> 
> Datetime also fails (so far) to handle it:
> >>> dt(2016,12,31,23,59,60)
> Traceback (most recent call last):
>   File "", line 1, in 
> ValueError: second must be in 0..59
> 
> But `time` works. Well, at least it doesn't raise an exception:
> >>> t = time.struct_time((2016,12,31,12,59,60,0,0,0)); t 
> time.struct_time(tm_year=2016, tm_mon=12, tm_mday=31, tm_hour=12, tm_min=59, 
> tm_sec=60, tm_wday=0, tm_yday=0, tm_isdst=0)
> >>> time.asctime(t)
> 'Mon Dec 31 12:59:60 2016'
> >>> time.gmtime(calendar.timegm(t))
> time.struct_time(tm_year=2017, tm_mon=1, tm_mday=1, tm_hour=1, tm_min=0, 
> tm_sec=0, tm_wday=6, tm_yday=1, tm_isdst=0)
> 
> Imagine a user that decides which library to use to store some (life 
> critical!) measurements taken every 100 ms. He looks at NumPy datetime64, 
> reads that it is capable of handling attosecods, and decides that it is a 
> perfect fit. Now imagine that on 31 Dec 2022 the World Government decided to 
> inject a leap second. The system will receive the announcement from the NTC 
> servers and will 
> prepare to replay this second twice. As soon as this moment chimes in he'll 
> run into a ValueError, which he won't notice because he's celebrating a New 
> Year :) And guess whom he'll blame? ;)
> 
> Actually the humanity has already got used to replaying timespans twice. It 
> happens every year in the countries that observe daylight saving time. And 
> the solution is to use a more linear scale than local time, namely, UTC. But 
> now turns out that UTC is not linear enough and it also has certain timespans 
> happening twice. 
> 
> The solution once again is use a _really_ linear time which is TAI. I think 
> python 'time' library did a right thing to introduce time.CLOCK_TAI, after 
> all.
> 
> Astropy handles the UTC scale properly though:
> >>> t = Time('2016-12-31 23:59:60')
> 
> >>> t0 = Time('2016-12-31 23:59:59')
> 
> >>> delta = t-t0
> 
> >>> delta.sec
> 0.99969589
> >>> t0 + delta
> 
> 
> So the solution for that particular person with regular intervals of time is 
> to use astropy. I mention it in the article.
> I made some corrections to the text. I'd be grateful if you had a look and 
> pointed me to the particular

[Numpy-discussion] Re: New feature: binary (arbitrary base) rounding

2022-11-10 Thread Stefano Miccoli



On 8 Nov 2022, at 15:32, 
numpy-discussion-requ...@python.org 
wrote:

Thanks for the proposal.  I don't have much of an opinion on this and
right now I am mainly wondering whether there is prior art which can
inform us that this is relatively widely useful?

Base-2 (bit) rounding is implemented in Numcodecs 

 in the context of data compression.
As pointed out by M. Klöwer et al. in  
 "Rounding bits without real 
information to zero facilitates lossless compression algorithms and encodes the 
uncertainty within the data itself."

I'm not an expert, but I never encountered rounding floating point numbers in 
bases different from 2 and 10.

Stefano
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] How is "round to N decimal places" defined for binary floating point numbers?

2023-12-28 Thread Stefano Miccoli via NumPy-Discussion

I have always been puzzled about how to correctly define the python built-in 
`round(number, ndigits)` when `number` is a binary float and `ndigits` is 
greater than zero.
Apparently CPython and numpy disagree:
>>> round(2.765, 2)
2.77
>>> np.round(2.765, 2)
2.76

My question for the numpy devs are:
- Is there an authoritative source that explains what `round(number, ndigits)` 
means when the digits are counted in a base different from the one used in the 
floating point representation?
- Which was the first programming language to implement an intrinsic function 
`round(number, ndigits)` where ndgits are always decimal, irrespective of the 
representation of the floating point number? (I’m not interested in algorithms 
for printing a decimal representation, but in languages that allow to store and 
perform computations with the rounded value.)
- Is `round(number, ndigits)` a useful function that deserves a rigorous 
definition, or is its use limited to fuzzy situations, where accuracy can be 
safely traded for speed?

Personally I cannot think of sensible uses of `round(number, ndigits)` for 
binary floats: whenever you positively need `round(number, ndigits)`, you 
should use a decimal floating point representation.

Stefano

smime.p7s
Description: S/MIME cryptographic signature
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: How is "round to N decimal places" defined for binary floating point numbers?

2023-12-29 Thread Stefano Miccoli via NumPy-Discussion

Oscar Gustafsson wrote:
> I would take it that round x to N radix-R digits means
> round_to_integer(x * R**N)/R**N
> (ignoring floating-point issues)

Yes, this is the tried-and-true way: first define the function in exact 
arithmetic, then ask for the floating point implementation to return an 
approximate result within a given tolerance, say 1/2 ulp.
This is what CPython does, and it seems to be quite hard to get it right, at 
least inspecting the code of the current implementation. BTW I think that numpy 
uses the same definitions, but simply accepts a bigger (not specified) 
tolerance in its implementation.

Maybe I should have phrased my question differently: is this definition the 
only accepted one, or there are different formulations which give raise to more 
expedite implementations?
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: Introducing quarterly date units to datetime64 and timedelta64

2024-02-24 Thread Stefano Miccoli via NumPy-Discussion

Actually quarters (3 months sub-year groupings) are already supported as 
‘M8[3M]’ and ‘m8[3M]’:
>>> np.datetime64('2024-05').astype('M8[3M]') - 
np.datetime64('2020-03').astype('M8[3M]')
numpy.timedelta64(17,'3M')
So explicitly introducing a ‘Q’ time unit is only to enable more intuitive 
representation/parsing of dates and durations.

I’m moderately negative on this proposal:
- there is no native support of quarters in Python
- ISO 8601-1 does not support sub-year groupings
- the ISO 8601-2 extensions representing sub-year groupings is not sufficiently 
widespread to be adopted by numpy. (E.g. '2001-34' expresses "second quarter of 
2001”, but I suppose nobody would guess this meaning) 

In other words, without a clear normative reference, implementing quarters in 
numpy would risk to introduce a custom/arbitrary notation.

Stefano

smime.p7s
Description: S/MIME cryptographic signature
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Comparison between numpy scalars returns numpy bool class and not native python bool class

2024-06-27 Thread Stefano Miccoli via NumPy-Discussion

It is well known that ‘np.bool' is not interchangeable with python ‘bool’, and 
in fact 'issubclass(np.bool, bool)’ is false.

On the contrary, numpy floats are subclassing python 
floats—'issubclass(np.float64, float) is true—so I’m wondering if the fact that 
scalar comparison returns a np.bool breaks the Liskov substitution principle. 
In fact  ’(np.float64(1) > 0) is True’ is unexpectedly false.

I was hit by this behaviour because in python structural pattern matching, the 
‘a > 1’ subject will not match neither ’True’ or ‘False’ if ‘a' is a numpy 
scalar: In this short example

import numpy as np
a = np.float64(1)
assert isinstance(a, float)
match a > 1:
case True | False:
print('python float')
case _:
print('Huh?: numpy float’)

the default clause is matched. If we set instead ‘a = float(1)’, the first 
clause will be matched. The surprise factor is quite high here, in my opinion.
(Let me add that ‘True', ‘False', ‘None' are special in python structural 
pattern matching, because they are matched by identity and not by equality.)

I’m not sure if this behaviour can be avoided, or if we have to live with the 
fact that numpy floats are to be kept well contained and never mixed with 
python floats.

Stefano

smime.p7s
Description: S/MIME cryptographic signature
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: Comparison between numpy scalars returns numpy bool class and not native python bool class

2024-06-28 Thread Stefano Miccoli via NumPy-Discussion

> On 27 Jun 2024, at 23:48, Aaron Meurer  wrote:
> 
> Apparently the reason this happens is that True, False, and None are
> compared using 'is' in structural pattern matching (see

Please let me stress that the ‘match/case’ snippet was only a concrete example 
of a situation in which, say ‘f(a)’ gives the correct result when ‘a’ is a 
‘float’ instance and breaks down when ‘a’ is a ‘np.foat64’ instance.
Now the fact that numpy floats are subclasses of python floats is quite a 
strong promise that this should never be the case…
Realistically this can be solved in a couple of ways.

(i) Refactoring ‘f(a)’ so that it is aware of the numpy float quirks… not 
always possible, especially if ‘f(a)’ belongs to an external package.

(ii) Sanitizing numpy floats, lets say by ‘f(a.item())’ in the calling code.

(iii) Ensuring that scalar comparisons always return python bools and not 
‘np.bool'

(i) and (ii) are quite simple user-side workarouns, but sometimes the surprise 
factor is high, as in the given code snippet. 

On the contrary (iii) is a radical solution on the library side, but I’m not 
sure if it’s worth implementing for a few edge cases. In fact  ‘b is True’ is 
an anti-pattern in python, and probably the places in which this behaviour 
surfaces should be sparse.

Stefano

smime.p7s
Description: S/MIME cryptographic signature
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: Precision changes to sin/cos in the next release?

2023-05-31 Thread Stefano Miccoli via NumPy-Discussion



On 31 May 2023, at 16:32, 
numpy-discussion-requ...@python.org 
wrote:

It seems fairly clear that with this recent change, the feeling is that the 
tradeoff is bad and that too much accuracy was lost, for not enough real-world 
gain. However, we now had several years worth of performance work with few 
complaints about accuracy issues. So I wouldn't throw out the baby with the 
bath water now and say that we always want the best accuracy only. It seems to 
me like we need a better methodology for evaluating changes. Contributors have 
been pretty careful, but looking back at SIMD PRs, there were usually detailed 
benchmarks but not always detailed accuracy impact evaluations.

Cheers,
Ralf


If I can throw my 2cents in, my feeling is that most user will not notice 
neither the decrease in accuracy, nor the increase in speed.
(I failed to mention, I'm an engineer so a few ULPs are almost nothing for 
me; unless I have to solve a very ILL conditioned problem, but then I do not 
blame numpy, but myself for formulating such a bad model ;-)

The only real problem is for code that relies on these assumptions:

assert np.sin(np.pi/2) == -np.cos(np.pi) == 1

which will fail in numpy==1.25.rc0 but should hold true for numpy~=1.24.3, at 
least on most runtime environments.

I do not have strong feelings on this issue: in an ideal world code should have 
unit-testing modules and assertion scattered here and there in order to make 
all implicit assumptions explicit. Adapting to the new routines should be 
fairly simple.
Of course we do not live in an ideal world and there will definitely be a 
number of users that will experience hard to debug failures linked to this new 
trig routines.

But again I prefer to remain neutral.

Stefano


___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: np.ndenumerate doesn't obey mask?

2024-10-23 Thread Stefano Miccoli via NumPy-Discussion


> On 22 Oct 2024, at 18:00, numpy-discussion-requ...@python.org wrote:
> 
> From: Neal Becker 
> Subject: [Numpy-discussion] np.ndenumerate doesn't obey mask?
> Date: 21 October 2024 at 18:52:41 CEST
> To: Discussion of Numerical Python 
> Reply-To: Discussion of Numerical Python 
> 
> 
> I was using ndenuerate with a masked array, and it seems that the mask is 
> ignored.  Is this true?  If so, isn't that a bug?
> 

This is expected behaviour: you should use the function from the “.ma" 
namespace “numpy.ma.ndenumerate"

 
for skipping masked elements. By design numpy.enumerate does not skip masked 
elements.

Stefano
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: Extend ldexp to handle complex inputs

2024-09-27 Thread Stefano Miccoli via NumPy-Discussion

‘np.ldexp’ exists mainly for compatibility with the C/C++ functions ldexp, 
ldexpf, ldexpl, which are defined for float/double/long double.
Quoting the C refs:

> On binary systems (where FLT_RADIX is 2), ldexp is equivalent to scalbn.
> The function ldexp ("load exponent"), together with its dual, frexp, can be 
> used to manipulate the representation of a floating-point number without 
> direct bit manipulations.
> On many implementations, ldexp is less efficient than multiplication or 
> division by a power of two using arithmetic operators.

So in general I do not think that extending ‘np.ldexp’ to complex makes much 
sense, since it would be peculiar to numpy implementation and destroy the C/C++ 
equivalence.

Stefano
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Why ufunc out arg has to be an ndarray?

2025-01-08 Thread Stefano Miccoli via NumPy-Discussion

From time to time I find myself overwriting a python buffer with the output of 
a ufunc, for example like this:

import array
import numpy as np

a = array.array('f', (1,1,1))
np.exp2(a, out=np.asarray(a))
assert a.tolist() == [2, 2, 2]

Here I have to wrap `out=np.asarray(a)` because the more natural `np.exp2(a, 
out=a)` raises "TypeError: return arrays must be of ArrayType”

In general, ufuncs are quite aggressive in utilizing the buffer protocol for 
input arguments. I was wondering, why are they so reluctant to do the same for 
the output argument?
Is this a design choice? Efficiency? Legacy? Would be implementing `np.ufunc(a, 
out=a)` dangerous or cumbersome?

Stefano

smime.p7s
Description: S/MIME cryptographic signature
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Re: [Numpy-discussion] round / set_printoptions discrepancy

Re: [Numpy-discussion] New DTypes: Are scalars a central concept in NumPy or not?

Re: [Numpy-discussion] Output type of round is inconsistent with python built-in

Re: [Numpy-discussion] Proposal: add the timestamp64 type (Noam Yorav-Raphael)

Re: [Numpy-discussion] Proposal: add the timestamp64 type

Re: [Numpy-discussion] bad CRC errors when using np.savez, only sometimes though!

[Numpy-discussion] Re: An article on numpy data types

[Numpy-discussion] Re: NumPy-Discussion Digest, Vol 183, Issue 33

[Numpy-discussion] Re: Proposal for new function to determine if a float contains an integer

[Numpy-discussion] Re: An article on numpy data types (Lev Maximov)

[Numpy-discussion] Re: New feature: binary (arbitrary base) rounding

[Numpy-discussion] How is "round to N decimal places" defined for binary floating point numbers?

[Numpy-discussion] Re: How is "round to N decimal places" defined for binary floating point numbers?

[Numpy-discussion] Re: Introducing quarterly date units to datetime64 and timedelta64

[Numpy-discussion] Comparison between numpy scalars returns numpy bool class and not native python bool class

[Numpy-discussion] Re: Comparison between numpy scalars returns numpy bool class and not native python bool class

[Numpy-discussion] Re: Precision changes to sin/cos in the next release?

[Numpy-discussion] Re: np.ndenumerate doesn't obey mask?

[Numpy-discussion] Re: Extend ldexp to handle complex inputs

[Numpy-discussion] Why ufunc out arg has to be an ndarray?

20 matches

Site Navigation

Mail list logo

Footer information