[Numpy-discussion] Re: spam on the mailing lists

2021-11-07 Thread matti . picus
Update: we now have a team of moderators for the mailing list. The policy for 
moderation is in the list summary [0]. 

Since moderation began on Oct 26, we have rejected 15 spam messages, 
unsubscribed 2 email addresses that spammed more than 3 each, and released 4 
messages from moderated users to the mailing list. 

Matti

[0] https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] ENH: add functionality NpyAppendArray to numpy.format

2021-11-07 Thread Michael Siebert
Dear all,

I'd like to add the NpyAppendArray functionality, compare

https://github.com/xor2k/npy-append-array (15 Stars so far)

and

https://stackoverflow.com/a/64403144/2755796 (10 Upvotes so far)

I have prepared a pull request and want to "test the waters" as suggested
by the message I have received when creating the pull request.

So what is NpyAppendArray about?

I love the .npy file format. It is really great! I cannot appreciate the
.npy capabilities mentioned in

https://numpy.org/devdocs/reference/generated/numpy.lib.format.html

enough, especially its simplicity. No comparison with the struggles we had
with HDF5. However, there is one feature Numpy currently does not provide:
a simple, efficient, easy-to-use and safe option to append to .npy (here
the text I've used in the Github repository above):

Appending to an array created by np.save might be possible under certain
circumstances, since the .npy total header byte count is required to be
evenly divisible by 64. Thus, there might be some spare space to grow the
shape entry in the array descriptor. However, this is not guaranteed and
might randomly fail. Initialize the array with NpyAppendArray(filename)
directly so the header will be created with 64 byte of spare header space
for growth. Will this be enough? It allows for up to 10^64 >= 2^212 array
entries or data bits. Indeed, this is less than the number of atoms in the
universe. However, fully populating such an array, due to limits imposed by
quantum mechanics, would require more energy than would be needed to boil
the oceans, compare

https://hbfs.wordpress.com/2009/02/10/to-boil-the-oceans

Therefore, a wide range of use cases should be coverable with this approach.

Who could use that?

I developed and use NpyAppendArray to efficiently create .npy arrays which
are larger than the main memory and can be loaded by memory mapping later,
e.g. for Deep Learning workflows. Another use case are binary log files,
which could be created on low end embedded devices and later be processed
without parsing, optionally again using memory maps.

How does the code look like?

Here some demo code of how this would look like in practice (taken from the
test file):

def test_NpyAppendArray(tmpdir):
arr1 = np.array([[1,2],[3,4]])
arr2 = np.array([[1,2],[3,4],[5,6]])

fname = os.path.join(tmpdir, 'npaa.npy')

with format.NpyAppendArray(fname) as npaa:
npaa.append(arr1)
npaa.append(arr2)
npaa.append(arr2)

arr = np.load(fname, mmap_mode="r")
arr_ref = np.concatenate([arr1, arr2, arr2])

assert_array_equal(arr, arr_ref)

Some more aspects:
1. appending efficiently only works along axis=0 at least for c order
(probably different for Fortran order)
2. One could also add the 64 bytes of spare space right on np.save.
However, I cannot really judge on how much of an issue that would be to the
users of np.save and it is not really necessary since users who want to
append to .npy files would create them with NpyAppendArray anyway.
3. Probably I have forgotten something here, some time has passed since the
initial Github commit.

So what do you think? Yes/No/Maybe? It would be really nice to get some
feedback on the mailing list here!

Although this might not be perfectly consistent with the protocol, I've
created the pull request already, just to force myself to finish this up
and I'm prepared to fail if there is no interest to get NpyAppendArray
directly into numpy ;)

Best from Berlin, Michael
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Documentation Team meeting - Monday November 8

2021-11-07 Thread Melissa Mendonça
Hi all!

Our next Documentation Team meeting will be on *Monday, November 8* at ***4PM
UTC***. For some reason, the events had not been showing up on the
community calendar, but I think I fixed it - let me know otherwise.

All are welcome - you don't need to already be a contributor to join. If
you have questions or are curious about what we're doing, we'll be happy to
meet you!

If you wish to join on Zoom, use this link:

https://zoom.us/j/96219574921?pwd=VTRNeGwwOUlrYVNYSENpVVBRRjlkZz09

Here's the permanent hackmd document with the meeting notes (still being
updated in the next few days!):

https://hackmd.io/oB_boakvRqKR-_2jRV-Qjg


Hope to see you around!

** You can click this link to get the correct time at your timezone:
https://www.timeanddate.com/worldclock/fixedtime.html?msg=NumPy+Documentation+Team+Meeting&iso=20211108T16&p1=1440&ah=1

*** You can add the NumPy community calendar to your google calendar by
clicking this link: https://calendar.google.com/calendar
/r?cid=YmVya2VsZXkuZWR1X2lla2dwaWdtMjMyamJobGRzZmIyYzJqODFjQGdyb3VwLmNhbGVuZGFyLmdvb2dsZS5jb20

- Melissa
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Make the pickle default protocol 4.

2021-11-07 Thread Charles R Harris
Hi All,

I'd like to propose making the NumPy default pickle protocol 4, the same as
the Python 3.8 default. That would have the advantage of supporting large
pickles. The current default protocol is 2, last the default in Python 2.7.

Thoughts?

Chuck
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] branching NumPy 1.22.x

2021-11-07 Thread Charles R Harris
Hi All,

I am aiming to branch NumPy 1.22.x next weekend. If there are any PRs that
you think need to be merged before the branch, please raise the issue.

Chuck
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com