[Numpy-discussion] Add count (and dtype) to packbits

2021-07-21 Thread Neal Becker
In my application I need to pack bits of a specified group size into
integral values.
Currently np.packbits only packs into full bytes.
For example, I might have a string of bits encoded as a np.uint8
vector with each uint8 item specifying a single bit 1/0.  I want to
encode them 4 bits at a time into a np.uint32 vector.

python code to implement this:

---
def pack_bits (inp, bits_per_word, dir=1, dtype=np.int32):
assert bits_per_word <= np.dtype(dtype).itemsize * 8
assert len(inp) % bits_per_word == 0
out = np.empty (len (inp)//bits_per_word, dtype=dtype)
i = 0
o = 0
while i < len(inp):
ret = 0
for b in range (bits_per_word):
if dir > 0:
ret |= inp[i] << b
else:
ret |= inp[i] << (bits_per_word - b - 1)
i += 1
out[o] = ret
o += 1
return out
---

It looks like unpackbits has a "count" parameter but packbits does not.
Also would be good to be able to specify an output dtype.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Add count (and dtype) to packbits

2021-07-21 Thread Andras Deak
On Wed, Jul 21, 2021 at 2:40 PM Neal Becker  wrote:

> In my application I need to pack bits of a specified group size into
> integral values.
> Currently np.packbits only packs into full bytes.
> For example, I might have a string of bits encoded as a np.uint8
> vector with each uint8 item specifying a single bit 1/0.  I want to
> encode them 4 bits at a time into a np.uint32 vector.
>
> python code to implement this:
>
> ---
> def pack_bits (inp, bits_per_word, dir=1, dtype=np.int32):
> assert bits_per_word <= np.dtype(dtype).itemsize * 8
> assert len(inp) % bits_per_word == 0
> out = np.empty (len (inp)//bits_per_word, dtype=dtype)
> i = 0
> o = 0
> while i < len(inp):
> ret = 0
> for b in range (bits_per_word):
> if dir > 0:
> ret |= inp[i] << b
> else:
> ret |= inp[i] << (bits_per_word - b - 1)
> i += 1
> out[o] = ret
> o += 1
> return out
> ---
>

Can't you just `packbits` into a uint8 array and then convert that to
uint32? If I change `dtype` in your code from `np.int32` to `np.uint32` (as
you mentioned in your email) I can do this:

rng = np.random.default_rng()
arr = (rng.uniform(size=32) < 0.5).astype(np.uint8)
group_size = 4
original = pack_bits(arr, group_size, dtype=np.uint32)
new = np.packbits(arr.reshape(-1, group_size), axis=-1,
bitorder='little').ravel().astype(np.uint32)
print(np.array_equal(new, original))
# True

There could be edge cases where the result dtype is too small, but I
haven't thought about that part of the problem. I assume this would work as
long as `group_size <= 8`.

András


> It looks like unpackbits has a "count" parameter but packbits does not.
> Also would be good to be able to specify an output dtype.
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Add count (and dtype) to packbits

2021-07-21 Thread Neal Becker
Well that's just the point, I wanted to consider group size > 8.

On Wed, Jul 21, 2021 at 8:53 AM Andras Deak  wrote:
>
> On Wed, Jul 21, 2021 at 2:40 PM Neal Becker  wrote:
>>
>> In my application I need to pack bits of a specified group size into
>> integral values.
>> Currently np.packbits only packs into full bytes.
>> For example, I might have a string of bits encoded as a np.uint8
>> vector with each uint8 item specifying a single bit 1/0.  I want to
>> encode them 4 bits at a time into a np.uint32 vector.
>>
>> python code to implement this:
>>
>> ---
>> def pack_bits (inp, bits_per_word, dir=1, dtype=np.int32):
>> assert bits_per_word <= np.dtype(dtype).itemsize * 8
>> assert len(inp) % bits_per_word == 0
>> out = np.empty (len (inp)//bits_per_word, dtype=dtype)
>> i = 0
>> o = 0
>> while i < len(inp):
>> ret = 0
>> for b in range (bits_per_word):
>> if dir > 0:
>> ret |= inp[i] << b
>> else:
>> ret |= inp[i] << (bits_per_word - b - 1)
>> i += 1
>> out[o] = ret
>> o += 1
>> return out
>> ---
>
>
> Can't you just `packbits` into a uint8 array and then convert that to uint32? 
> If I change `dtype` in your code from `np.int32` to `np.uint32` (as you 
> mentioned in your email) I can do this:
>
> rng = np.random.default_rng()
> arr = (rng.uniform(size=32) < 0.5).astype(np.uint8)
> group_size = 4
> original = pack_bits(arr, group_size, dtype=np.uint32)
> new = np.packbits(arr.reshape(-1, group_size), axis=-1, 
> bitorder='little').ravel().astype(np.uint32)
> print(np.array_equal(new, original))
> # True
>
> There could be edge cases where the result dtype is too small, but I haven't 
> thought about that part of the problem. I assume this would work as long as 
> `group_size <= 8`.
>
> András
>
>>
>> It looks like unpackbits has a "count" parameter but packbits does not.
>> Also would be good to be able to specify an output dtype.
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion



-- 
Those who don't understand recursion are doomed to repeat it
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Adding POWER10 (VSX4) support to the SIMD framework

2021-07-21 Thread Nicholai Tukanov
I would like to understand how to go about extending the SIMD framework in
order to add support for POWER10. Specifically, I would like to add the
following instructions: `lxvp` and `stxvp` which loads/stores 256 bits
into/from two vectors. I believe that this will be able to give a decent
performance boost for those on POWER machines since it can halved the
amount of loads/stores issued.

Additionally, matrix engines (2-D SIMD instructions) are becoming quite
popular due to their performance improvements for deep learning and
scientific computing. Would it be beneficial to add these new advanced SIMD
instructions into the framework or should these instructions be left to
libraries such as OpenBLAS and MKL?

Thank you,
Nicholai Tukanov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Proposed change from POSIX to PyMem_RawXXX

2021-07-21 Thread Daniel Waddington
Hi,
I'm working with Numpy in the context of supporting different memory types such as persistent memory and CXL attached.  I would like to propose a minor change, but figured I would get some initial feedback from the developer community before submitting a PR.
 
In multiarray/alloc.c the allocator (beneath the cache) using the POSIX malloc/calloc/realloc/free.  I propose that these should be changed to PyMem_RawXXX equivalents.  The reason for this is that by doing so, one can use the python custom allocator functions (e.g. PyMem_GetAllocator/PyMem_SetAllocator) to intercept the memory allocator for NumPy arrays.  This will be useful as heterogeneous memories need supporting.
 
There are likely other places in NumPy that could do with a rinse and repeat - may be someone could advise?
 
Thanks,
Daniel
 
---
Example patch for 1.19.x (I'm building with Python3.6)
 
diff --git a/numpy/core/src/multiarray/alloc.c b/numpy/core/src/multiarray/alloc.cindex 795fc7315..e9e888478 100644--- a/numpy/core/src/multiarray/alloc.c+++ b/numpy/core/src/multiarray/alloc.c@@ -248,7 +248,7 @@ PyDataMem_NEW(size_t size) void *result;  assert(size != 0);-    result = malloc(size);+    result = PyMem_RawMalloc(size); if (_PyDataMem_eventhook != NULL) { NPY_ALLOW_C_API_DEF NPY_ALLOW_C_API@@ -270,7 +270,7 @@ PyDataMem_NEW_ZEROED(size_t size, size_t elsize) { void *result; -    result = calloc(size, elsize);+    result = PyMem_RawCalloc(size, elsize); if (_PyDataMem_eventhook != NULL) { NPY_ALLOW_C_API_DEF NPY_ALLOW_C_API@@ -291,7 +291,7 @@ NPY_NO_EXPORT void PyDataMem_FREE(void *ptr) { PyTraceMalloc_Untrack(NPY_TRACE_DOMAIN, (npy_uintp)ptr);-    free(ptr);+    PyMem_RawFree(ptr);
 
 
Daniel G. Waddington
 
Principal Research Staff Member,
Data & Storage Systems Research, IBM Research AlmadenE-mail: daniel.wadding...@ibm.com
Phone: +1 408 927 2359
 
 

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion