[Numpy-discussion] Introducing quarterly date units to datetime64 and timedelta64
Dear Numpy Team, I am writing to propose the addition of quarterly date units (Q) to the datetime64 and timedelta64 data types in Numpy. While Numpy currently supports various date units, including years, months, weeks, and days, the absence of quarterly units limits its usability in numerous applications. Users are often forced to use cumbersome workarounds such as writing custom code or relying on external libraries, which detracts from the simplicity and efficiency of using Numpy. Quarters are a fundamental temporal unit, particularly used in finance, business, and economics (it is probably the most analyzed frequency in macroeconometrics). Many time series datasets are reported at quarterly intervals, making native support for quarterly units essential for simplifying data handling and manipulation within Numpy. Although alternative methods exist for representing quarters, native support would improve consistency across codebases and enhance interoperability with other libraries and frameworks. Ultimately it leads to improving the overall user experience using dates data types in Numpy. I would like to discuss this proposal further and would be happy to open an issue on GitHub to initiate the conversation if the proposal sounds reasonable. Your feedback would be appreciated. Best regards, Oyibo ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Introducing quarterly date units to datetime64 and timedelta64
Actually quarters (3 months sub-year groupings) are already supported as ‘M8[3M]’ and ‘m8[3M]’: >>> np.datetime64('2024-05').astype('M8[3M]') - np.datetime64('2020-03').astype('M8[3M]') numpy.timedelta64(17,'3M') So explicitly introducing a ‘Q’ time unit is only to enable more intuitive representation/parsing of dates and durations. I’m moderately negative on this proposal: - there is no native support of quarters in Python - ISO 8601-1 does not support sub-year groupings - the ISO 8601-2 extensions representing sub-year groupings is not sufficiently widespread to be adopted by numpy. (E.g. '2001-34' expresses "second quarter of 2001”, but I suppose nobody would guess this meaning) In other words, without a clear normative reference, implementing quarters in numpy would risk to introduce a custom/arbitrary notation. Stefano smime.p7s Description: S/MIME cryptographic signature ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: JSON format for multi-dimensional data
Thank you Matti for this response. I completed issue 12481 because in my opinion the format proposal responds to this issue. However, if you think a specific issue is preferable, I can create it. To fully understand the proposed standard, it involves representing multidimensional data that has any type. The only constraint is that each data can be represented by a JSON format. This is of course the case for all pandas types but it can also be one of the following types: a year, a polygon, a URI, a type defined in darwincore or in schemaorg... This means that each library or framework must transform this JSON data into an internal value (e.g. a polygon can be translated into a shapely object). The defined types are described in the NTV Internet-Draft [2]. > - How does it handle sharing data? NumPy can handle very large ndarrays, > and a read-only container with a shared memory location, like in DLPack > [0] seems more natural than a format that precludes sharing data. Concerning the first question, the purpose of this standard is complementary to what is proposed by DLPack (DLPack offers standard access mechanisms to data in memory, which avoids duplication between frameworks): - the format is a neutral reversible exchange format built on JSON (and therefore with duplication) which can be used independently of any framework. - the data types are numerous and with a broader scope than that offered by DLPack (numeric types only). > - Is there a size limitation either on the data or on the number of > dimensions? Could this format represent, for instance, data with more > than 100 dimensions, which could not be mapped back to NumPy. Regarding the second question, no there is no limitation on data size or dimensions linked to the format (JSON does not impose limits on array sizes). > Perhaps, like the Pandas package, it should live outside NumPy for a > while until some wider consensus could emerge. Regarding this initial remark, this is indeed a possible option but it depends on the answer to the question: - does Numpy want to have a neutral JSON exchange format to exchange data with other frameworks (tabular, multidimensional or other)? This is why I am interested in having a better understanding of the needs (see end of the initial email). [2] https://www.ietf.org/archive/id/draft-thomy-json-ntv-02.html#appendix-A ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Converting array to string
Hello, I am seeking a bit of help. I need a fast way to transfer numpy arrays in json format. Thus, converting array to list is not an option and I need something similar to: a = np.ones(1000) %timeit a.tobytes() 17.6 ms This is quick, fast and portable. In other words I am very happy with this, but... Json does not support bytes. Any method of subsequent conversion from bytes to string is number of times slower than the actual serialisation. So my question is: Is there any way to serialise directly to string? I remember there used to be 2 methods: tobytes and tostring. However, I see that tostring is deprecated and its functionality is equivalent to `tobytes`. Is it completely gone? Or is there a chance I can still find a performant version of converting to and back from `str` type of non-human readable form? Regards, DG ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Introducing quarterly date units to datetime64 and timedelta64
Dear Stefano, Thank you for your feedback on the proposal regarding introducing quarterly date units. I appreciate your insight into the existing capabilities already built into NumPy. The support for quarters using the M8[3M] notation is fascinating and new to me. You've raised good points not to introduce a quarterly date type. There must be good reasons it has never been implemented. I see that there are existing solutions within NumPy, such as using M8[3M], but they may not be intuitive or easy to discover for users. Here are 4 typical scenarios and solutions a NumPy user could come up with: 1) ChatGPT: Return the start date of the quarter for a numpy array of datetime64 import numpy as np # Assuming dates is your NumPy array of datetime64 objects dates = np.array(['2024-01-15', '2024-04-20', '2024-07-05', '2024-10-12'], dtype='datetime64') # Calculate the start date of the quarter for each date quarters_start = dates.astype('datetime64[M]').astype('datetime64[Q]') print(quarters_start) => Not so great... 2) stackoverflow: https://stackoverflow.com/search?q=numpy+datetime64+quarter year_quarter = pd.Period(dates, freq='Q') => Just use Pandas... 3) Unfamiliar user (pure Numpy): dates = np.asarray(dates, dtype=" Works, but ugly... 4) Advanced user: dates = np.asarray(dates, dtype=" Really, so easy... As demonstrated in the scenarios provided, it's quite possible that users will not find the best and built in path using NumPy. First of all I am very happy to know there is an alternative using the M8[3M] notation. Personally I would prefer to have "Q" instead of "3M" as quarterly unit because it is more intuitive and aligns with other libraries like Pandas. You are moderately negative on this proposal. However, do you see a way to improve the user experience? Thank you for your feedback. Regards, Oyibo ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Introducing quarterly date units to datetime64 and timedelta64
For some reason scenarios 3 & 4 got butchered. 3) Unfamiliar user (pure Numpy): dates = np.asarray(dates, dtype=' Works, but ugly... 4) Advanced user: dates = np.asarray(dates, dtype=' Really, so easy... ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com