[Numpy-discussion] Introducing quarterly date units to datetime64 and timedelta64

2024-02-24 Thread Oyibo
Dear Numpy Team,

I am writing to propose the addition of quarterly date units (Q) to the 
datetime64 and timedelta64 data types in Numpy.

While Numpy currently supports various date units, including years, months, 
weeks, and days, the absence of quarterly units limits its usability in 
numerous applications. Users are often forced to use cumbersome workarounds 
such as writing custom code or relying on external libraries, which detracts 
from the simplicity and efficiency of using Numpy.

Quarters are a fundamental temporal unit, particularly used in finance, 
business, and economics (it is probably the most analyzed frequency in 
macroeconometrics). Many time series datasets are reported at quarterly 
intervals, making native support for quarterly units essential for simplifying 
data handling and manipulation within Numpy.

Although alternative methods exist for representing quarters, native support 
would improve consistency across codebases and enhance interoperability with 
other libraries and frameworks.
Ultimately it leads to improving the overall user experience using dates data 
types in Numpy.

I would like to discuss this proposal further and would be happy to open an 
issue on GitHub to initiate the conversation if the proposal sounds reasonable. 
Your feedback would be appreciated.

Best regards, Oyibo
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Introducing quarterly date units to datetime64 and timedelta64

2024-02-24 Thread Stefano Miccoli via NumPy-Discussion
Actually quarters (3 months sub-year groupings) are already supported as 
‘M8[3M]’ and ‘m8[3M]’:
>>> np.datetime64('2024-05').astype('M8[3M]') - 
np.datetime64('2020-03').astype('M8[3M]')
numpy.timedelta64(17,'3M')
So explicitly introducing a ‘Q’ time unit is only to enable more intuitive 
representation/parsing of dates and durations.

I’m moderately negative on this proposal:
- there is no native support of quarters in Python
- ISO 8601-1 does not support sub-year groupings
- the ISO 8601-2 extensions representing sub-year groupings is not sufficiently 
widespread to be adopted by numpy. (E.g. '2001-34' expresses "second quarter of 
2001”, but I suppose nobody would guess this meaning) 

In other words, without a clear normative reference, implementing quarters in 
numpy would risk to introduce a custom/arbitrary notation.

Stefano

smime.p7s
Description: S/MIME cryptographic signature
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: JSON format for multi-dimensional data

2024-02-24 Thread philippe
Thank you Matti for this response.

I completed issue 12481 because in my opinion the format proposal responds to 
this issue. However, if you think a specific issue is preferable, I can create 
it.

To fully understand the proposed standard, it involves representing 
multidimensional data that has any type. The only constraint is that each data 
can be represented by a JSON format. This is of course the case for all pandas 
types but it can also be one of the following types: a year, a polygon, a URI, 
a type defined in darwincore or in schemaorg...
This means that each library or framework must transform this JSON data into an 
internal value (e.g. a polygon can be translated into a shapely object).
The defined types are described in the NTV Internet-Draft [2].

> - How does it handle sharing data? NumPy can handle very large ndarrays, 
> and a read-only container with a shared memory location, like in DLPack 
> [0] seems more natural than a format that precludes sharing data.

Concerning the first question, the purpose of this standard is complementary to 
what is proposed by DLPack (DLPack offers standard access mechanisms to data in 
memory, which avoids duplication between frameworks):

 - the format is a neutral reversible exchange format built on JSON (and 
therefore with duplication) which can be used independently of any framework.
 - the data types are numerous and with a broader scope than that offered by 
DLPack (numeric types only).

> - Is there a size limitation either on the data or on the number of 
> dimensions? Could this format represent, for instance, data with more 
> than 100 dimensions, which could not be mapped back to NumPy.

Regarding the second question, no there is no limitation on data size or 
dimensions linked to the format (JSON does not impose limits on array sizes).

> Perhaps, like the Pandas package, it should live outside NumPy for a 
> while until some wider consensus could emerge. 

Regarding this initial remark, this is indeed a possible option but it depends 
on the answer to the question:

  - does Numpy want to have a neutral JSON exchange format to exchange data 
with other frameworks (tabular, multidimensional or other)?

This is why I am interested in having a better understanding of the needs (see 
end of the initial email).

[2] https://www.ietf.org/archive/id/draft-thomy-json-ntv-02.html#appendix-A
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Converting array to string

2024-02-24 Thread Dom Grigonis
Hello,

I am seeking a bit of help.

I need a fast way to transfer numpy arrays in json format.

Thus, converting array to list is not an option and I need something similar to:
a = np.ones(1000)
%timeit a.tobytes()
17.6 ms
This is quick, fast and portable. In other words I am very happy with this, 
but...

Json does not support bytes.

Any method of subsequent conversion from bytes to string is number of times 
slower than the actual serialisation.

So my question is: Is there any way to serialise directly to string?

I remember there used to be 2 methods: tobytes and tostring. However, I see 
that tostring is deprecated and its functionality is equivalent to `tobytes`. 
Is it completely gone? Or is there a chance I can still find a performant 
version of converting to and back from `str` type of non-human readable form?

Regards,
DG

___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Introducing quarterly date units to datetime64 and timedelta64

2024-02-24 Thread Oyibo
Dear Stefano,

Thank you for your feedback on the proposal regarding introducing quarterly 
date units.

I appreciate your insight into the existing capabilities already built into 
NumPy.
The support for quarters using the M8[3M] notation is fascinating and new to me.

You've raised good points not to introduce a quarterly date type. There must be 
good reasons it has never been implemented.
I see that there are existing solutions within NumPy, such as using M8[3M], but 
they may not be intuitive or easy to discover for users. 
Here are 4 typical scenarios and solutions a NumPy user could come up with:

1) ChatGPT: Return the start date of the quarter for a numpy array of datetime64

import numpy as np

# Assuming dates is your NumPy array of datetime64 objects
dates = np.array(['2024-01-15', '2024-04-20', '2024-07-05', '2024-10-12'], 
dtype='datetime64')

# Calculate the start date of the quarter for each date
quarters_start = dates.astype('datetime64[M]').astype('datetime64[Q]')

print(quarters_start)

=> Not so great...

2) stackoverflow: https://stackoverflow.com/search?q=numpy+datetime64+quarter

year_quarter = pd.Period(dates, freq='Q')

=> Just use Pandas...

3) Unfamiliar user (pure Numpy):

dates = np.asarray(dates, dtype=" Works, but ugly...

4) Advanced user:

dates = np.asarray(dates, dtype=" Really, so easy...

As demonstrated in the scenarios provided, it's quite possible that users will 
not find the best and built in path using NumPy.
First of all I am very happy to know there is an alternative using the M8[3M] 
notation. Personally I would prefer to have "Q" instead of "3M" as quarterly 
unit because it is more intuitive and aligns with other libraries like Pandas.
You are moderately negative on this proposal. However, do you see a way to 
improve the user experience?
Thank you for your feedback.

Regards, Oyibo
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Introducing quarterly date units to datetime64 and timedelta64

2024-02-24 Thread Oyibo
For some reason scenarios 3 & 4 got butchered.

3) Unfamiliar user (pure Numpy):

dates = np.asarray(dates, dtype=' Works, but ugly...

4) Advanced user:

dates = np.asarray(dates, dtype=' Really, so easy...
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com