[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-20 Thread Michael Siebert
Dear all,another aspect to think about is that there is not only cumsum. There are other cumulative aggregations as well (whether or not they have top-level np functions, like cummax is represented by np.maximum.accumulate):1. cumprod: there instead of starting with zero one would need to start with one2. cummax: start with -np.inf3. cummin: start with np.inf4. Maybe more? Those so far came to mymind.So introducing a parameter for cummax and not one for the others would be some sort of inconsistency. For e.g. cummax and cumprod, all data types (int8, int16, …, float, double) support 0 and 1, for cummax and cummin one would need types that support infinity and negative numbers (to make it meaningfully convertable to other types).And how would such a cumprod be called if one wanted to give it a new name? cumprod0 or cumprod1?Just some thoughts.Best, MichaelOn 19. Aug 2023, at 19:02, Dom Grigonis  wrote:Unfortunately, I don’t have a good answer.For now, I can only tell you what I think might benefit from improvement.1. Verbosity. I appreciate that bracket syntax such as one in julia or matlab `[A B C ...]` is not possible, so functional is the only option. E.g. julia has functions named ‘cat’, ‘vcat’, ‘hcat’, ‘vhcat’. I myself have recently redefined np.concatenate to `np_c`. For simple operations, it would surely be nice to have methods. E.g. `arr.append(axis)/arr.prepend(axis)`.2. Excessive number of functions. There seems to be very many functions for concatenating and stacking. Many operations can be done using different functions and approaches and usually one of them is several times faster than the rest. I will give an example. Stacking two 1d vectors as columns of 2d array:
  
  




arr = np.arange(100)
TIMER.repeat([
lambda: np.array([arr, arr]).T,
lambda: np.vstack([arr, arr]).T,
lambda: np.stack([arr, arr]).T,
lambda: np.c_[arr, arr],
lambda: np.column_stack((arr, arr)),
lambda: np.concatenate([arr[:, None], arr[:, None]], axis=1)
]).print(3)
# mean [[0.012 0.044 0.052 0.13  0.032 0.024]]Instead, having fewer, but more intuitive/flexible and well optimised functions would be a bit more convenient.3. Flattening and reshaping API is not very intuitive. e.g. torch flatten is an example of a function which has a desired level of flexibility in contrast to `np.flatten`. https://pytorch.org/docs/stable/generated/torch.flatten.html. I had similar issues with multidimensional searching, sorting, multi-dimensional overlaps and custom unique functions. In other words, all functionality is there already, but in more custom (although requirement is often very simple from perspective of how it looks in my mind) multi-dimensional cases, there is no easy API and I end up writing my own numpy functions and benchmarking numerous ways to achieve the same thing. By now, I have my own multi-dimensional unique, sort, search, flatten, more flexible ix_, which are not well tested, but already more convenient, flexible and often several times faster than numpy ones (although all they do is reuse existing numpy functionality).I think these are more along the lines of numpy 2.0, rather than simple extension. It feels that API can generally be more flexible and intuitive and there is enough of existing numpy material and external examples from which to draw from to make next level API happen. Although I appreciate required effort and difficulties.Having all that said, implementing julia’s equivalents ‘cat’, ‘vcat’, ‘hcat’, ‘vhcat’ together with `arr.append(others, axis), arr.prepend(others, axis)` while ensuring that they use most optimised approaches could potentially make life easier for the time being.
—Nothing ever dies, just enters the state of deferred evaluation—Dg

On 19 Aug 2023, at 17:39, Ronald van Elburg  wrote:I think ultimately the copy is unnecessary.That being said introducing prepend and append functions concentrates the complexity of the mapping in one place. Trying to avoid the extra copy would probably lead to a more complex implementation of accumulate.  How would in your view the prepend interface differ from concatenation or stacking?___NumPy-Discussion mailing list -- numpy-discussion@python.orgTo unsubscribe send an email to numpy-discussion-le...@python.orghttps://mail.python.org/mailman3/lists/numpy-discussion.python.org/Member address: dom.grigo...@gmail.com___NumPy-Discussion mailing list -- numpy-discussion@python.orgTo unsubscribe send an email to numpy-discussion-le...@python.orghttps://mail.python.org/mailman3/lists/numpy-discussion.python.org/Member address: michael.sieber...@gmail.com___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-

[Numpy-discussion] Re: Windows default integer now 64bit in main

2023-11-02 Thread Michael Siebert
Hi Sebastian,

great news! Does that mean that Windows Numpy 64 bit default integers are 
coming before Numpy 2.0, like in Numpy 1.27? Will there be another release 
before 2.0?

Best, Michael

> On 2. Nov 2023, at 16:25, Sebastian Berg  wrote:
> Hi all,
> 
> just a heads up, the PR to change the default integer is merged on
> main.  This may cause issues, especially with Cython code because
> `np.int_t` cannot be reasonably defined anymore.
> 
> Other code may also want to vet usage of "long" in any variation.  Much
> code (like SciPy) simply supports any integer input, although even
> there integer output may be relevant.  New NumPy defines
> `NPY_DEFAULT_INT` to be able to branch at runtime for backward
> compatiblity you could use:
> 
> #ifndef NPY_DEFAULT_INT
> #define NPY_DEFAULT_INT NPY_LONG
> #endif
> 
> Unfortunately, I expect this to be a bit painful, please let us know if
> it is too painful for some reason.
> 
> But OTOH it has been a recurring surprise and is a common reason for
> linux written software to not run on windows.
> 
> - Sebastian
> 
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: michael.sieber...@gmail.com
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: NEP 55 Updates and call for testing

2023-12-08 Thread Michael Siebert
Hi Nathan,

thank you for your great work on UTF8 strings and their integration in Numpy. 
This is a very important dtype to support, especially with the widespread use 
of large language models (LLM) nowadays.
However, I would like to comment on the serialization. Hope it's not too late 
at this point (the last time I looked I think the serialization was less 
detailed), but the approach with sidecar_size has the disadvantage that Numpy 
arrays would not be efficiently appendable anymore. Also, while it's 
comfortable to not need to specify how the sidecar data looks like, it makes 
the format more proprietary, less open and more difficult to debug. To be 
precise, I see this part of NEP 1 violated:

Be reverse engineered. Datasets often live longer than the programs that 
created them. A competent developer should be able to create a solution in his 
preferred programming language to read most NPY files that he has been given 
without much documentation.

On top of that, what if the data format for the sidecar data changes? Is it 
then still possible to read old files with a newer Numpy version?

To overcome this issues, first and foremost I suggest not to introduce a .npy 
file format version 4.0 with this kind of fundamental change. I suggest to 
instead of adding the sidecar data to the same .npy file to add it to an extra, 
standard .npy file. Here an example: if the user wants to save to 
"mystrings.npy", the files "mystrings.npy" and "mystrings.npy.idx" could be 
created where latter is also just a regular .npy file and contains 
indices/offsets into mystrings.npy that would otherwise end up in the sidecar 
data. When the file "mystrings.npy" is loaded, Numpy checks (would check) 
whether there is a mystrings.npy.idx and tries to load it as well. This is an 
approach I use pretty regularly with video data: the single frames are all in 
one big (appendable) array and the index array contains the begin 
indices/offsets and (redundantly, for fast lookups, sort etc.) the lengths and 
end offsets. This concept is very generic, can be used for all sorts of ragged 
arrays including text, satisfies the requirements of NEP 1 and is efficiently 
appendable (or at least moves the burden from the programmer to the filesystem).

Best, Michael

PS Also, a dedicated index array can come with a custom shape, which would 
allow for multidimensional ragged arrays (in the future). Wouldn't that be 
great?

PPS this is a copy of what I have written in the pull request. As Nathan said, 
the serialization part is not defined and/or implemented yet and I’d like to 
hear some more opinions about this.

> On 8. Dec 2023, at 19:35, Nathan  wrote:
> 
> I just opened a draft PR to include stringdtype in numpy: 
> https://github.com/numpy/numpy/pull/25347
> 
> If you are interested in testing the new dtype but haven't had the chance 
> yet, hopefully this should be easier to test. From a clone of the NumPy repo, 
> doing:
> 
> $ git fetch https://github.com/ngoldbaum/numpy stringdtype:stringdtype
> $ git checkout stringdtype
> $ git submodule update --init
> $ python -m pip install .
> 
> should build and install a version of NumPy that includes stringdtype, 
> importable as `np.dtypes.StringDType`. Note that this is based on numpy 2.0 
> dev, so if you need to use another package that depends on NumPy's ABI to 
> test the dtype, you'll need to rebuild that project as well.
> 
> I'll be continuing to work on this PR to finish integrating stringdtype into 
> NumPy and write documentation. 
> 
> If anyone has any feedback on any aspect of the NEP or the stringdtype code 
> please reply here, on github, or reach out to me privately.
> 
> On Wed, Nov 22, 2023 at 1:22 PM Nathan  > wrote:
>> Hi all,
>> 
>> This week I updated NEP 55 to reflect the changes I made to the prototype 
>> since
>> I initially sent out the NEP. The updated NEP is available on the NumPy 
>> website:
>> https://numpy.org/neps/nep-0055-string_dtype.html.
>> 
>> Updates to the NEP
>> ++
>> 
>> The changes since the original version of the NEP focus on fully defining 
>> the C
>> API surface we would like to add to the NumPy C API and an implementation of 
>> a
>> per-dtype-instance arena allocator to manage heap allocations. This enabled
>> major improvements to the prototype, including implementing the small string
>> optimization and locking all access to heap memory behind a fine-grained 
>> mutex
>> which should prevent seg faults or memory corruption in a multithreaded
>> context. Thanks to Warren Weckesser for his proof of concept code and help 
>> with
>> the small string optimization implementation, he has been added as an author 
>> to
>> reflect his contributions. 
>> 
>> With these changes the stringdtype prototype is feature complete.
>> 
>> Call to Review NEP 55
>> +
>> 
>> I'm requesting another round of review on the NEP with an eye toward 
>> acceptance
>> before the NumPy 2.

[Numpy-discussion] Re: ENH: Introducing a pipe Method for Numpy arrays

2024-02-15 Thread Michael Siebert
Hi all,

in PyTorch they (kind of) recently introduced torch.compile:

https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html

In TensorFlow, eager execution needs to be activated manually, otherwise it 
creates a graph object which then acts like this kind of pipe.

Don‘t know whether that‘s useful info for an implementation in Numpy. I‘m just 
referring to what I think may be similar to pipes in other Numpy-like 
frameworks.

Best, Michael

> On 15. Feb 2024, at 22:13, Marten van Kerkwijk  wrote:
> 
> 
>> 
>> What were your conclusions after experimenting with chained ufuncs?
>> 
>> If the speed is comparable to numexpr, wouldn’t it be `nicer` to have
>> non-string input format?
>> 
>> It would feel a bit less like a black-box.
> 
> I haven't gotten further than it yet, it is just some toying around I've
> been doing.  But I'd indeed prefer not to go via strings -- possibly
> numexpr could use a similar mechanism to what I did to construct the
> function that is being evaluated.
> 
> Aside: your suggestion of the pipe led to some further discussion at
> https://github.com/numpy/numpy/issues/25826#issuecomment-1947342581
> -- as a more general way of passing arrays to functions.
> 
> -- Marten
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: michael.sieber...@gmail.com
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] ENH: add functionality NpyAppendArray to numpy.format

2021-11-07 Thread Michael Siebert
Dear all,

I'd like to add the NpyAppendArray functionality, compare

https://github.com/xor2k/npy-append-array (15 Stars so far)

and

https://stackoverflow.com/a/64403144/2755796 (10 Upvotes so far)

I have prepared a pull request and want to "test the waters" as suggested
by the message I have received when creating the pull request.

So what is NpyAppendArray about?

I love the .npy file format. It is really great! I cannot appreciate the
.npy capabilities mentioned in

https://numpy.org/devdocs/reference/generated/numpy.lib.format.html

enough, especially its simplicity. No comparison with the struggles we had
with HDF5. However, there is one feature Numpy currently does not provide:
a simple, efficient, easy-to-use and safe option to append to .npy (here
the text I've used in the Github repository above):

Appending to an array created by np.save might be possible under certain
circumstances, since the .npy total header byte count is required to be
evenly divisible by 64. Thus, there might be some spare space to grow the
shape entry in the array descriptor. However, this is not guaranteed and
might randomly fail. Initialize the array with NpyAppendArray(filename)
directly so the header will be created with 64 byte of spare header space
for growth. Will this be enough? It allows for up to 10^64 >= 2^212 array
entries or data bits. Indeed, this is less than the number of atoms in the
universe. However, fully populating such an array, due to limits imposed by
quantum mechanics, would require more energy than would be needed to boil
the oceans, compare

https://hbfs.wordpress.com/2009/02/10/to-boil-the-oceans

Therefore, a wide range of use cases should be coverable with this approach.

Who could use that?

I developed and use NpyAppendArray to efficiently create .npy arrays which
are larger than the main memory and can be loaded by memory mapping later,
e.g. for Deep Learning workflows. Another use case are binary log files,
which could be created on low end embedded devices and later be processed
without parsing, optionally again using memory maps.

How does the code look like?

Here some demo code of how this would look like in practice (taken from the
test file):

def test_NpyAppendArray(tmpdir):
arr1 = np.array([[1,2],[3,4]])
arr2 = np.array([[1,2],[3,4],[5,6]])

fname = os.path.join(tmpdir, 'npaa.npy')

with format.NpyAppendArray(fname) as npaa:
npaa.append(arr1)
npaa.append(arr2)
npaa.append(arr2)

arr = np.load(fname, mmap_mode="r")
arr_ref = np.concatenate([arr1, arr2, arr2])

assert_array_equal(arr, arr_ref)

Some more aspects:
1. appending efficiently only works along axis=0 at least for c order
(probably different for Fortran order)
2. One could also add the 64 bytes of spare space right on np.save.
However, I cannot really judge on how much of an issue that would be to the
users of np.save and it is not really necessary since users who want to
append to .npy files would create them with NpyAppendArray anyway.
3. Probably I have forgotten something here, some time has passed since the
initial Github commit.

So what do you think? Yes/No/Maybe? It would be really nice to get some
feedback on the mailing list here!

Although this might not be perfectly consistent with the protocol, I've
created the pull request already, just to force myself to finish this up
and I'm prepared to fail if there is no interest to get NpyAppendArray
directly into numpy ;)

Best from Berlin, Michael
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: ENH: add functionality NpyAppendArray to numpy.format

2021-11-08 Thread Michael Siebert
My memories reappeared:
3. One could think about allowing variable sized .npy files without header
modification at all, e.g. by setting the variable sized shape entry (axis
0) to -1. The length of the array would then be inferred by the file size.
However, what I personally dislike about that approach is that given a .npy
file, it would be impossible to determine whether it was actually completed
or for some reason data got lost, e.g. by an incomplete download. Indeed,
the mere length is not as reliable as e.g. a sha256 sum, but still better
than nothing. Could this be a thing or is this maybe the preferable
solution after all?

On Sun, Nov 7, 2021 at 6:11 PM Michael Siebert 
wrote:

> Dear all,
>
> I'd like to add the NpyAppendArray functionality, compare
>
> https://github.com/xor2k/npy-append-array (15 Stars so far)
>
> and
>
> https://stackoverflow.com/a/64403144/2755796 (10 Upvotes so far)
>
> I have prepared a pull request and want to "test the waters" as suggested
> by the message I have received when creating the pull request.
>
> So what is NpyAppendArray about?
>
> I love the .npy file format. It is really great! I cannot appreciate the
> .npy capabilities mentioned in
>
> https://numpy.org/devdocs/reference/generated/numpy.lib.format.html
>
> enough, especially its simplicity. No comparison with the struggles we had
> with HDF5. However, there is one feature Numpy currently does not provide:
> a simple, efficient, easy-to-use and safe option to append to .npy (here
> the text I've used in the Github repository above):
>
> Appending to an array created by np.save might be possible under certain
> circumstances, since the .npy total header byte count is required to be
> evenly divisible by 64. Thus, there might be some spare space to grow the
> shape entry in the array descriptor. However, this is not guaranteed and
> might randomly fail. Initialize the array with NpyAppendArray(filename)
> directly so the header will be created with 64 byte of spare header space
> for growth. Will this be enough? It allows for up to 10^64 >= 2^212 array
> entries or data bits. Indeed, this is less than the number of atoms in the
> universe. However, fully populating such an array, due to limits imposed by
> quantum mechanics, would require more energy than would be needed to boil
> the oceans, compare
>
> https://hbfs.wordpress.com/2009/02/10/to-boil-the-oceans
>
> Therefore, a wide range of use cases should be coverable with this
> approach.
>
> Who could use that?
>
> I developed and use NpyAppendArray to efficiently create .npy arrays which
> are larger than the main memory and can be loaded by memory mapping later,
> e.g. for Deep Learning workflows. Another use case are binary log files,
> which could be created on low end embedded devices and later be processed
> without parsing, optionally again using memory maps.
>
> How does the code look like?
>
> Here some demo code of how this would look like in practice (taken from
> the test file):
>
> def test_NpyAppendArray(tmpdir):
> arr1 = np.array([[1,2],[3,4]])
> arr2 = np.array([[1,2],[3,4],[5,6]])
>
> fname = os.path.join(tmpdir, 'npaa.npy')
>
> with format.NpyAppendArray(fname) as npaa:
> npaa.append(arr1)
> npaa.append(arr2)
> npaa.append(arr2)
>
> arr = np.load(fname, mmap_mode="r")
> arr_ref = np.concatenate([arr1, arr2, arr2])
>
> assert_array_equal(arr, arr_ref)
>
> Some more aspects:
> 1. appending efficiently only works along axis=0 at least for c order
> (probably different for Fortran order)
> 2. One could also add the 64 bytes of spare space right on np.save.
> However, I cannot really judge on how much of an issue that would be to the
> users of np.save and it is not really necessary since users who want to
> append to .npy files would create them with NpyAppendArray anyway.
> 3. Probably I have forgotten something here, some time has passed since
> the initial Github commit.
>
> So what do you think? Yes/No/Maybe? It would be really nice to get some
> feedback on the mailing list here!
>
> Although this might not be perfectly consistent with the protocol, I've
> created the pull request already, just to force myself to finish this up
> and I'm prepared to fail if there is no interest to get NpyAppendArray
> directly into numpy ;)
>
> Best from Berlin, Michael
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: An article on numpy data types

2021-12-26 Thread Michael Siebert
Dear Lev,

thank you a lot! Something like this should be part of the Numpy documentation. 
I like the diagram, looks very nice! Also, I’ve opened an issue regarding data 
types

https://github.com/numpy/numpy/issues/20662 


Some feedback from my side:

1. When calling numpy.array([1,2,3,4]) it gives me an int64 data type most of 
the time (two x86_64 systems, one arm64 system). The only time I’ve got int32 
was on a Raspberry Pi, which is a software limitation, since the CPU is 64 bit 
and they have even replaced their so-far 32bit only Raspberry Pi Zero with a 
64bit version (yes, one day Raspberry OS with 64 bit might actually become the 
default!). I don’t know what machine you are working on, but int64 should be 
the default.
2. x64 refers to the obsolete Intel Itanium architecture (mentioned once). 
Should be x86_64.
3. np.errstate looks nice, I could use that for my pull request as well.

Many thanks & best regards, Michael


> On 25. Dec 2021, at 10:02, Lev Maximov  wrote:
> 
> Hi everyone,
> 
> I'm almost done with the article about numpy types – something I haven't 
> covered in Numpy Illustrated. 
> 
> Would someone please have a look to confirm I haven't written anything 
> anti-climatic there?
> 
> https://axil.github.io/numpy-data-types.html 
> 
> 
> --
> Best regards,
> Lev
> 
> PS Earlier today I've mistakenly sent an email with the wrong link.
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: michael.sieber...@gmail.com

___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: An article on numpy data types

2021-12-26 Thread Michael Siebert
Hey Lev,

I‘ve forgotten to mention my MacBook M1,
it‘s also int64 there.

Python on Windows is and is supposed to be, as far as I get it, a dying 
platform. A billion things are broken there (HDF comes to my mind) and it seems 
even Microsoft wants developers to move away from native Windows with their 
introduction of WSL (Windows Subsystem for Linux). Its latest version, WSL2 
even comes with an actual Linux kernel and since Windows 11, it has support for 
graphical applications (Xorg) out of the box. With Visual Studio Code (also 
Microsoft) and it’s remote capabilities, one does not even feel a difference 
between developing in an Ubuntu in a WSL in Windows and an actual Ubuntu.

Considering the „traditional“ C datatypes, fixed types and prioritizing them in 
Numpy documentation, that‘s what my issue (see below) is about. I think they 
have summarized it nicely in 

https://matt.sh/howto-c

Best regards, Michael

> On 26. Dec 2021, at 13:49, Lev Maximov  wrote:
> 
> 
> Dear Michael,
> 
> Thank you for your feedback! 
> 
> I've fixed the x86_64 typo. 
> 
> I'll think how to reformulate the int32 part. I work on debian x86_64 and 
> windows 10 64bit. Constructing an array with np.array([1,2,3]) as well as 
> np.array([1,2,3], dtype=np.int_) gives me int64 dtype on linux, and int32 on 
> windows.
> 
> As suggested by Matti, I've put the rst source (and images) into a separate 
> github repository
> 
> https://github.com/axil/numpy-data-types
> 
> PRs are welcome. My primary concern is to exclude serious typos/mistakes that 
> might mislead/harm the readers if used.
> 
> My personal preference is towards explicit width types like np.int32, but 
> from reading the docs I have a feeling there's a trend of migrating towards 
> the c-style notation. 
> 
> Best regards,
> Lev   
> 
>> On Sun, Dec 26, 2021 at 7:05 PM Michael Siebert 
>>  wrote:
>> Dear Lev,
>> 
>> thank you a lot! Something like this should be part of the Numpy 
>> documentation. I like the diagram, looks very nice! Also, I’ve opened an 
>> issue regarding data types
>> 
>> https://github.com/numpy/numpy/issues/20662
>> 
>> Some feedback from my side:
>> 
>> 1. When calling numpy.array([1,2,3,4]) it gives me an int64 data type most 
>> of the time (two x86_64 systems, one arm64 system). The only time I’ve got 
>> int32 was on a Raspberry Pi, which is a software limitation, since the CPU 
>> is 64 bit and they have even replaced their so-far 32bit only Raspberry Pi 
>> Zero with a 64bit version (yes, one day Raspberry OS with 64 bit might 
>> actually become the default!). I don’t know what machine you are working on, 
>> but int64 should be the default.
>> 2. x64 refers to the obsolete Intel Itanium architecture (mentioned once). 
>> Should be x86_64.
>> 3. np.errstate looks nice, I could use that for my pull request as well.
>> 
>> Many thanks & best regards, Michael
>> 
>> 
>>> On 25. Dec 2021, at 10:02, Lev Maximov  wrote:
>>> 
>>> Hi everyone,
>>> 
>>> I'm almost done with the article about numpy types – something I haven't 
>>> covered in Numpy Illustrated. 
>>> 
>>> Would someone please have a look to confirm I haven't written anything 
>>> anti-climatic there?
>>> 
>>> https://axil.github.io/numpy-data-types.html
>>> 
>>> --
>>> Best regards,
>>> Lev
>>> 
>>> PS Earlier today I've mistakenly sent an email with the wrong link.
>>> ___
>>> NumPy-Discussion mailing list -- numpy-discussion@python.org
>>> To unsubscribe send an email to numpy-discussion-le...@python.org
>>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>>> Member address: michael.sieber...@gmail.com
>> 
>> ___
>> NumPy-Discussion mailing list -- numpy-discussion@python.org
>> To unsubscribe send an email to numpy-discussion-le...@python.org
>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>> Member address: lev.maxi...@gmail.com
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: michael.sieber...@gmail.com
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: An article on numpy data types

2021-12-26 Thread Michael Siebert
Hi Matti, hi Lev,

that‘s cool there are numbers on Python usage! According to those, Windows 
might still be around quite a while. Would be interesting to include options 
like „Windows (native)“ and „Windows (WSL)“ for future surveys.

Windows is the main operating system I‘m working with most of the time because 
I need Office almost daily and it is not too much of an impact thanks to WSL.

Many might be locked in to use Windows by company policies - my company is 
fortunately not too strict. That might explain a good portion of the high 
Windows usage.

So meanwhile we might need to engineer everything at least twice: Windows and 
Linux (and some modifications for MacOS once in a while). Or engineer something 
wait until someone else complains.

I can check Lev‘s article on a Windows Python later. Let‘s see, maybe they have 
meanwhile fixed that HDF issue.

int32 would probably be the default if it‘s a 32 bit Python installation. There 
are still so many 32 bit programs around on Windows - quite scary in 2021 where 
probably almost every smartphone is 64 bit.

> On 26. Dec 2021, at 16:22, Matti Picus  wrote:
> 
> 
>> On 26/12/21 3:44 pm, Michael Siebert wrote:
>> Hey Lev,
>> 
>> I‘ve forgotten to mention my MacBook M1,
>> it‘s also int64 there.
>> 
>> Python on Windows is and is supposed to be, as far as I get it, a dying 
>> platform.
> 
> 
> Your statement is the first time I have heard this. Of those who answered the 
> 2020 Python Developers Survey[0], 68% use linux, 48% Windows and 29% macOS 
> (yes, the total is more than 100%: users could tick more than one box), which 
> was up slightly from 2018 [1] where windows was 47%. I couldn't find a line 
> in the survey about WSL, but the people I know still want to work directly on 
> Windows.
> 
> 
> Matti
> 
> 
> [0] https://www.jetbrains.com/lp/python-developers-survey-2020/, search for 
> "Operating system"
> 
> [1] https://www.jetbrains.com/research/python-developers-survey-2018/ search 
> for "Operating system"
> 
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: michael.sieber...@gmail.com
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: An article on numpy data types

2021-12-26 Thread Michael Siebert
Okay, little modification to my last mail: many Android smartphones are still 
32 bit, but according to

https://www.androidauthority.com/arm-32-vs-64-bit-explained-1232065/

from 2023 on, all (or at least many) new ARM processors will be 64 bit only.

Apple‘s iPhone 64 bit only since quite a while already (September 2017, iOS 11 
release).

> On 26. Dec 2021, at 17:31, Lev Maximov  wrote:
> 
> 
> Hi Michael,
> 
> > Python on Windows is and is supposed to be, as far as I get it, a dying 
> > platform.
> I would join Matti in thinking that it is a misconception. 
> 
> Have you heard of the enormous daily updated unofficial repository of the 
> binary windows compilations of 
> almost 600 python libraries by Christoph Gohlke? (numpy and libs depending on 
> it are built with MKL there)
> It is there for a reason.
> 
> If you look at the stats such as this one (Matti already mentioned them while 
> I was writing this text),
> 
> https://www.jetbrains.com/research/python-developers-survey-2018/
> https://www.jetbrains.com/lp/python-developers-survey-2020/
> 
> you'll see (in addition to the fact that numpy is the #1 library in data 
> science ;) ) that in 
> the recent years the percentage of windows user among the developers is quite 
> high:
> 69% linux - 47% windows - 32% macos (2018)
> 68% linux - 48% windows - 29% macos (2020)
> So it looks as if it is rather growing than dying.
> 
> This is due to the popularity of the above mentioned data science and AI, 
> which have skyrocketed in the 
> last 10 years. And the vast majority of data scientists work on windows.
> 
> Windows as a platform for developers as a whole is also quite flourishing 
> today.
> According to the stackoverflow 2021 developer survey 45% of the respondents 
> use Windows (25% linux, 25% macos).
> Among the professional developers the numbers are 41% for windows, 30% macos, 
> 26% linux.
> 
> Also the primary audience of the tutorials like mine (as well as of 
> stackoverflow?) are windows users. 
> Linux users can easily figure things described there on their own, through 
> the docstrings, source code 
> or, as a last resort, through the docs )
> 
> >The more experienced the Python developers are, the more likely they are to 
> >use Linux and macOS as development 
> > environments, and the less likely they are to choose Windows.
> (from the same jetbrains survey of 2018)
> 
> I wouldn't like to go into holy wars, though. I'm equally literate in both 
> unix and windows (somewhat less in macos) 
> and in my opinion the interests of all the users of the the three operating 
> systems should be taken into account
> in both the code of the library and the docs.
> 
> The documentation is sometimes pretty ignorant of mac/windows users, btw:
> > Alias on this platform (Linux x86_64)
> https://numpy.org/doc/stable/reference/arrays.scalars.html#numpy.int_
> And what about the other platforms? 
> 
> As for the particular issue of the difference in the default integer types, 
> in my opinion the default choice of int32 on windows for 
> array [1,2,3] fits the description
> 
> >" If not given, then the type will be determined as the minimum type 
> >required to hold the objects in the sequence."
> https://numpy.org/doc/stable/reference/generated/numpy.array.html
> 
> better than int64 on linux/macos.
>  
> Best regards,
> Lev
> 
> 
>> On Sun, Dec 26, 2021 at 8:45 PM Michael Siebert 
>>  wrote:
>> Hey Lev,
>> 
>> I‘ve forgotten to mention my MacBook M1,
>> it‘s also int64 there.
>> 
>> Python on Windows is and is supposed to be, as far as I get it, a dying 
>> platform. A billion things are broken there (HDF comes to my mind) and it 
>> seems even Microsoft wants developers to move away from native Windows with 
>> their introduction of WSL (Windows Subsystem for Linux). Its latest version, 
>> WSL2 even comes with an actual Linux kernel and since Windows 11, it has 
>> support for graphical applications (Xorg) out of the box. With Visual Studio 
>> Code (also Microsoft) and it’s remote capabilities, one does not even feel a 
>> difference between developing in an Ubuntu in a WSL in Windows and an actual 
>> Ubuntu.
>> 
>> Considering the „traditional“ C datatypes, fixed types and prioritizing them 
>> in Numpy documentation, that‘s what my issue (see below) is about. I think 
>> they have summarized it nicely in 
>> 
>> https://matt.sh/howto-c
>> 
>> Best regards, Michael
>> 
>>>> On 26. Dec 2021, at 13:49, Lev Maximov  wrote:
>>>> 
>>> 
>>> Dear Michael,
>>> 
>>&

[Numpy-discussion] An extension of the .npy file format

2022-01-08 Thread Michael Siebert
Dear all,

originally, I have planned to make an extension of the .npy file format a
dedicated follow-up pull request, but I have upgraded my current request
instead, since it was not as difficult to implement as I initially thought
and probably a more straight-forward solution:

https://github.com/numpy/numpy/pull/20321/

What is this pull request about? It is about appending to Numpy .npy files.
Why? I see two main use cases:

   1. creating .npy files larger than the main memory. They can, once
   finished, be loaded as memory maps
   2. creating binary log files, which can be processed very efficiently
   without parsing

Are there not other good file formats to do this? Theoretically yes, but
practically they can be pretty complex and with very little tweaking .npy
could do efficient appending too.

Use case 1 is already covered by the Pip/Conda package npy-append-array I
have created and getting the functionality directly into Numpy was the
original goal of the pull request. This would have been possible without
introducing a new file format version, just by adding some spare space in
the header. During the pull request discussion it turned out that rewriting
the header after each append would be desirable in case the writing program
crashes to minimize data loss.

Use case 2 however would highly profit from a new file format version as it
would make rewriting the header unnecessary: since efficient appending can
only take place along one axis, setting shape[-1] = -1 in case of Fortran
order or shape[0] = -1 otherwise (default) in the .npy header on file
creation could indicate that the array size is determined by the file size:
when np.load (typically with memory mapping on) gets called, it constructs
the ndarray with the actual shape by replacing the -1 in the constructor
call. Otherwise, the header is not modified anymore, neither on append nor
on file write finish.

Concurrent appends to a single file would not be advisable and should be
channeled through a single AppendArray instance. Concurrent reads while
writes take place however should work relatively smooth: every time an
np.load (ideally with mmap) is called, the ndarray would provide access to
all data written until that time.

Currently, my pull request provides:

   1. A definition of .npy version 4.0 that supports -1 in the shape
   2. implementations for fortran order and non-fortran order (default)
   including test cases
   3. Updated np.load
   4. The AppendArray class that does the actual appending

Although there is a certain hassle with introducing a new .npy version, the
changes themselves are very small. I could also implement a fallback mode
for older Numpy installations, if someone is interested.

What do you think about such a feature, would it make sense? Anyone
available for some more code review?

Best from Berlin, Michael

PS thank you so far, I could improve my npy-append-array module as well and
from what I have seen so far the Numpy code readability exceeded my already
high expectations.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: [Job] NumPy Open Source Developer at NVIDIA

2022-03-04 Thread Michael Siebert
They were in my spam folder, too.

> On 2. Mar 2022, at 22:51, Ilhan Polat  wrote:
> 
> 
> Found the original in the spam folder. 
> 
> 
> 
> Hi all,
> 
> I'm excited to share that I'm hiring a remote NumPy developer at NVIDIA! The 
> majority of their time will be focused on open source contributions to the 
> NumPy project, so I wanted to share the opportunity on this list.
> 
> We are continuing to expand our support of the PyData ecosystem and looking 
> to hire strong engineers who are or can become contributors to NumPy, Pandas, 
> Scikit-learn, SciPy, and NetworkX. Please see the job posting for more 
> details.
> 
> Non-US based applicants are eligible in certain countries. I can follow up 
> with individuals to confirm eligibility.
> 
> Thanks,
> Mike
> 
> 
> 
>> On Wed, Mar 2, 2022 at 10:39 PM Stephan Hoyer  wrote:
>> Hi Inessa -- could you share the original job description? It looks like it 
>> got lost from your message :)
>> 
>>> On Wed, Mar 2, 2022 at 12:28 PM Inessa Pawson  wrote:
>>> Hi, Mike!
>>> This is wonderful news! NumPy could certainly use more help. 
>>> 
>>> Cheers,
>>> Inessa
>>> 
>>> Inessa Pawson
>>> Contributor Experience Lead | NumPy
>>> email: ine...@albuscode.org
>>> ___
>>> NumPy-Discussion mailing list -- numpy-discussion@python.org
>>> To unsubscribe send an email to numpy-discussion-le...@python.org
>>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>>> Member address: sho...@gmail.com
>> ___
>> NumPy-Discussion mailing list -- numpy-discussion@python.org
>> To unsubscribe send an email to numpy-discussion-le...@python.org
>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>> Member address: ilhanpo...@gmail.com
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: michael.sieber...@gmail.com
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] StackOverflow Developer Survey

2022-06-23 Thread Michael Siebert
Hi all,

just found some more survey data on Numpy, see (need to scroll down a little to 
„Other frameworks and libraries“):

https://survey.stackoverflow.co/2022

Numpy seems to enjoy an exceptional position as a library over a wide spectrum 
of programming languages: rank 2 overall and among professionals and even rank 
1 with beginners.

Well deserved!

Best, Michael___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: writing a known-size 1D ndarray serially as it's calced

2022-08-23 Thread Michael Siebert
Hi all,

I‘ve made the Pip/Conda module npy-append-array for exactly this purpose, see

https://github.com/xor2k/npy-append-array

It works with one dimensional arrays, too, of course. The key challange is to 
properly initialize and update the header accordingly as the array grows which 
my module takes care of. I‘d like to integrate this functionality directly into 
Numpy, see PR

https://github.com/numpy/numpy/pull/20321/

but I have been busy and did have not received any feedback recently. A more 
direct integration into Numpy would allow to skip or ease the header update 
part, e.g. by introducing a new file format version. This could turn .npy into 
a sort of binary CSV equivalent where the size of the array is determined by 
the file size.

Best, Michael

> On 24. Aug 2022, at 03:04, Robert Kern  wrote:
> 
> On Tue, Aug 23, 2022 at 8:47 PM  wrote:
>> I want to calc multiple ndarrays at once and lack memory, so want to write 
>> in chunks (here sized to GPU batch capacity). It seems there should be an 
>> interface to write the header, then write a number of elements cyclically, 
>> then add any closing rubric and close the file. 
>> 
>> Is it as simple as lib.format.write_array_header_2_0(fp, d) 
>> then writing multiple shape(N,) arrays of float by fp.write(item.tobytes())?
>  
> `item.tofile(fp)` is more efficient, but yes, that's the basic scheme. There 
> is no footer after the data.
> 
> The alternative is to use `np.lib.format.open_memmap(filename, mode='w+', 
> dtype=dtype, shape=shape)`, then assign slices sequentially to the returned 
> memory-mapped array. A memory-mapped array is usually going to be friendlier 
> to whatever memory limits you are running into than a nominally "in-memory" 
> array.
> 
> -- 
> Robert Kern
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: michael.sieber...@gmail.com
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: An extension of the .npy file format

2022-08-24 Thread Michael Siebert
Hi Matti, hi all,

@Matti: I don’t know what exactly you are referring to (Pull request or the 
Github project, links see below). Maybe some clarification is needed, which I 
hereby try to do ;)

A .npy file created by some appending process is a regular .npy file and does 
not need to be read in chunks. Processing arrays larger than the systems memory 
can already be done with memory mapping (numpy.load(… mmap_mode=...)), so no 
third-party support is needed to do so.

The idea is not necessarily to only write some known-but-fragmented content to 
a .npy file in chunks or to only handle files larger than the RAM.

It is more about the ability to append to a .npy file at any time and between 
program runs. For example, in our case, we have a large database-like file 
containing all (preprocessed) images of all videos used to train a neural 
network. When new video data arrives, it can simply be appended to the existing 
.npy file. When training the neural net, the data is simply memory mapped, 
which happens basically instantly and does not use extra space between multiple 
training processes. We have tried out various fancy, advanced data formats for 
this task, but most of them don’t provide the memory mapping feature which is 
very handy to keep the time required to test a code change comfortably low - 
rather, they have excessive parse/decompress times. Also other libraries can 
also be difficult to handle, see below.

The .npy array format is designed to be limited. There is a NEP for it, which 
summarizes the .npy features and concepts very well:

https://numpy.org/neps/nep-0001-npy-format.html 


One of my favorite features (besides memory mapping perhaps) is this one:

“… Be reverse engineered. Datasets often live longer than the programs that 
created them. A competent developer should be able to create a solution in his 
preferred programming language to read most NPY files that he has been given 
without much documentation. ..."

This is a big disadvantage with all the fancy formats out there: they require 
dedicated libraries. Some of these libraries don’t come with nice and free 
documentation (especially lacking easy-to-use/easy-to-understand code examples 
for the target language, e.g. C) and/or can be extremely complex, like HDF5. 
Yes, HDF5 has its users and is totally valid if one operates the world’s 
largest particle accelerator, but we have spend weeks finding some C/C++ 
library for it which does not expose bugs and is somehow documented. We 
actually failed and posted a bug which was fixed a year later or so. This can 
ruin entire projects - fortunately not ours, but it ate up a lot of time we 
could have spend more meaningful. On the other hand, I don’t see how e.g. zarr 
provides added value over .npy if one only needs the .npy features and maybe 
some append-data-along-one-axis feature. Yes, maybe there are some uses for two 
or three appendable axes, but I think having one axis to append to should cover 
a lot of use cases: this axis is typically time: video, audio, GPS, signal data 
in general, binary log data, "binary CSV" (lines in file): all of those only 
need one axis to append to.

The .npy format is so simple, it can be read e.g. in C in a few lines. Or 
accessed easily through Numpy and ctypes by pointers for high speed custom 
logic - not even requiring libraries besides Numpy.

Making .npy appendable is easy to implement. Yes, appending along one axis is 
limited as the .npy format itself. But I consider that rather to be a feature 
than a (actual) limitation, as it allows for fast and simple appends.

The question is if there is some support for an 
append-to-.npy-files-along-one-axis feature in the Numpy community and if so, 
about the details of the actual implementation. I made one suggestion in

https://github.com/numpy/numpy/pull/20321/ 


and I offer to invest time to update/modify/finalize the PR. I’ve also created 
a library that can already append to .npy:

https://github.com/xor2k/npy-append-array 


However, due to current limitations in the .npy format, the code is more 
complex than it could actually be (the library initializes and checks spare 
space in the header) and it needs to rewrite the header every time. Both could 
be made unnecessary with a very small addition to the .npy file format. Data 
would stay continuous (no fragmentation!), there should just be a way to 
indicate that the actual shape of the array should derived from the file size.

Best, Michael

> On 24. Aug 2022, at 19:16, Matti Picus  wrote:
> 
> Sorry for the late reply. Adding a new "*.npy" format feature to allow 
> writing to the file in chunks is nice but seems a bit limited. As I 
> understand the proposal, reading the file back can only be done in the chunks 
> that were originally written. I think other libraries like zar or h5py have 
> sol

[Numpy-discussion] Re: New feature: binary (arbitrary base) rounding

2022-11-11 Thread Michael Siebert
Hi all,

an advantage of sub-byte datatypes is the potential for accelerated computing. 
For GPUs, int4 is already happening. Or take int1 for example: if one had two 
arrays of size 64, that would be eight bytes. Now, if one wanted to add those 
two arrays, one could simply xor them as a uint64 (or 8x uint8 xor).

However, I would rather limit sub-bytetypes to int1, (u)int2 and (u)int4, as 
they are the only ones that divide the byte evenly (or to begin with at least).

Considering single element access: a single element in such an array could be 
accessed by dividing the index, e.g. and ANDing with a mask. Probably uint8 
would make sense for this. That would create some overhead of course, but the 
data is more compact (which is nice for CPU/GPU cache) and full-array ops are 
faster.

Striding could be done similarly to single element access. This would be 
inefficient as well, but one could auto-generate some type specific C code (for 
int1, (u)int2, (u)int4 and their combinations) that accelerates popular 
operators. So one would not need to actually loop over every entry with single 
element access.

„byte size strided“: isn‘t it possible to pre-process the strides and 
post-process the output as mentioned above? Like a wrapping class around a 
uint8 array.

What do you think? Am I missing out on something?

Best, Michael

> On 11. Nov 2022, at 18:23, Sebastian Berg  wrote:
> 
> On Fri, 2022-11-11 at 09:13 -0700, Greg Lucas wrote:
>>> 
>>> OK, more below.  But unfortunately `int2` and `int4` *are*
>>> problematic,
>>> because the NumPy array uses a byte-sized strided layout, so you
>>> would
>>> have to store them in a full byte, which is probably not what you
>>> want.
>> 
>> 
>>> I am always thinking of adding a provision for it in the DTypes so
>>> that
>>> someone could use part of the NumPy machine to make an array that
>>> can
>>> have non-byte sized strides, but the NumPy array itself is ABI
>>> incompatible with storing these packed :(.
>> 
>> 
>> 
>> (I.e. we could plug that "hole" to allow making an int4 DType in
>> NumPy,
>>> but it would still have to take 1-byte storage space when put into
>>> a
>>> NumPy array, so I am not sure there is much of a point.)
>> 
>> 
>> 
>> 
>> I have also been curious about the new DTypes mechanism and whether
>> we
>> could do non byte-size DTypes with it. One use-case I have
>> specifically is
>> for reading and writing non byte-aligned data [1]. So, this would
>> work very
>> well for that use-case if the dtype knew how to read/write the
>> proper bit-size. For my use-case I wouldn't care too much if
>> internally
>> Numpy needs to expand and store the data as full bytes, but being
>> able to
>> read a bitwise binary stream into Numpy native dtypes for further
>> processing would be useful I think (without having to resort to
>> unpackbits
>> and do rearranging/packing to other types).
>> 
>> dtype = {'names': ('count0', 'count1'), 'formats': ('uint3',
>> 'uint5')}
>> # x would have two unsigned ints, but reading only one byte from the
>> stream
>> x = np.frombuffer(buffer, dtype)
>> # would be ideal to get tobytes() to know how to pack a uint3+uint5
>> DType
>> into a single byte as well
>> x.tobytes()
> 
> 
> Unfortunately, I suspect the amount of expectations users would have
> from a full DType, and the fact that bit-sized will be a bit awkward in
> NumPy arrays for the forseeable future makes me think dedicated
> conversion functions are probably more practical.
> 
> Yes, you could do a `MyInt(bits=5, offset=3)` DType and at least you
> could view the same array also with `MyInt(bits=3, offset=0)`.  (Maybe
> also structured DType, but I am not certain that is advisable and
> custom structured DTypes would require holes to be plucked).
> 
> A custom dtype that is "structured" might work (i.e. you could store
> two numbers in one byte of course).
> Currently you cannot integrate deep enough into NumPy to build
> structured dtypes based on arbitrary other dtypes, but you could do it
> for your own bit DType.
> (I am not quite sure you can make `arr["count0"]` work, this is a hole
> that needs plucking.)
> 
> This is probably not a small task though.
> 
> 
> Could `tobytes()` be made to compactify?  Yes, but then it suddenly
> needs extra logic for bit-sized and doesn't just expose memory.  That
> is maybe fine, but also seems a bit awkward? 
> 
> I would love to have a better answer, but dancing around the byte-
> strided ABI seems tricky...
> 
> Anyway, I am always available to discuss such possibilities, there are
> some corners w.r.t. to such bit-sized thoughts which are still shrouded
> in fog.
> 
> - Sebastian
> 
> 
>> 
>> Greg
>> 
>> [1] Specifically, this is for very low bandwidth satellite data where
>> we
>> try to pack as much information in the downlink and use every bit of
>> space,
>> but once on the ground I can expand the bit-size fields to byte-size
>> fields
>> without too much issue of worrying about space [puns intended].
>> 
>> 
>>

[Numpy-discussion] Re: NumPy 2.0 meeting - Monday, April 3rd, 3 - 7pm UTC

2023-04-08 Thread Michael Siebert
Hi everybody,

thanks a lot for the great meeting! I took it as inspiration for a few 
conversations with GPT4 (easter eggs included):

https://gist.github.com/xor2k/3baf9cfaf2cde16193204c6148315ecd

If anyone likes, I’d be in for a GPT4 session and maybe we can come up with 
more specific questions that could lead to more useful answers. However, please 
note that the current limit is 25 requests per 3 hours, so either we are 
limited to that (which might be fine) or we need to use multiple accounts. 
Also, maybe several attempts might be necessary as the servers are pretty busy 
(“at capacity”) quite often.

Best, Michael

> On 30. Mar 2023, at 17:44, Ralf Gommers  wrote:
> 
> Hi all,
> 
> As part of this meeting we have reserved a 30 minute slot for lightning 
> talks. Those can be for topics that are on our tentative roadmap already (see 
> https://github.com/orgs/numpy/projects/9/views/1), or topics that you'd like 
> to add to that roadmap and would like to drive and pitch to the audience. 
> 
> Here is a signup sheet: https://hackmd.io/NAVZWpFcTcuq7_ZmzOyCWw. Please add 
> yourself and your proposed topic/title if you're interested! 
> 
> Cheers,
> Ralf
> 
> 
> On Mon, Mar 27, 2023 at 2:44 AM Inessa Pawson  > wrote:
>> Hi, everyone!
>> The preparation for NumPy 2.0 release is currently underway. The NumPy 
>> project leadership would like to invite you all to join us on Monday, April 
>> 3rd, 3 - 7pm UTC for the meeting where the NumPy maintainers will present 
>> the scope of the planned work for this release and hold in-depth discussions 
>> on every proposed feature.
>> For details of the meeting agenda, visit: 
>> https://docs.google.com/document/d/1vNDBlnVCJM-Nv7xF02Re4qk_5rR9uAUi0ih6K3b2YAE/edit?usp=sharing.
>>  Please keep in mind that it hasn’t been finalized yet.
>> Join us via Zoom: 
>> https://numfocus-org.zoom.us/j/87080449579?pwd=dmt5dktCT2l1NG5ZNGFvb3dKdUNIdz09.
>> 
>> -- 
>> Cheers,
>> Inessa
>> 
>> Inessa Pawson
>> Contributor Experience Lead | NumPy
>> https://numpy.org/
>> GitHub: inessapawson
>> ___
>> NumPy-Discussion mailing list -- numpy-discussion@python.org 
>> 
>> To unsubscribe send an email to numpy-discussion-le...@python.org 
>> 
>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>> Member address: ralf.gomm...@gmail.com 
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org 
> 
> To unsubscribe send an email to numpy-discussion-le...@python.org 
> 
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: michael.sieber...@gmail.com 
> 
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com