[Numpy-discussion] How to sample unique vectors

2023-11-17 Thread Stefan van der Walt via NumPy-Discussion
Hi all,

I am trying to sample k N-dimensional vectors from a uniform distribution 
without replacement.
It seems like this should be straightforward, but I can't seem to pin it down.

Specifically, I am trying to get random indices in an d0 x d1 x d2.. x dN-1 
array.

I thought about sneaking in a structured dtype into `rng.integers`, but of 
course that doesn't work.

If we had a string sampler, I could sample k unique words (consisting of 
digits), and convert them to indices.

I could over-sample and filter out the non-unique indices. Or iteratively draw 
blocks of samples until I've built up my k unique indices.

The most straightforward solution would be to flatten indices, and to sample 
from those. The integers get large quickly, though. The rng.integers docstring 
suggests that it can handle object arrays for very large integers:

> When using broadcasting with uint64 dtypes, the maximum value (2**64)
> cannot be represented as a standard integer type.
> The high array (or low if high is None) must have object dtype, e.g., 
> array([2**64]).

But, that doesn't work:

In [35]: rng.integers(np.array([2**64], dtype=object))
ValueError: high is out of bounds for int64

Is there an elegant way to handle this problem?

Best regards,
Stéfan
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: How to sample unique vectors

2023-11-17 Thread Stefan van der Walt via NumPy-Discussion
On Fri, Nov 17, 2023, at 11:07, Robert Kern wrote:
> If the arrays you are drawing indices for are real in-memory arrays for 
> present-day 64-bit computers, this should be adequate. If it's a notional 
> array that is larger, then you'll need actual arbitrary-sized integer 
> sampling. The builtin `random.randrange()` will do arbitrary-sized integers 
> and is quite reasonable for this task. If you want it to use our 
> BitGenerators underneath for clean PRNG state management, this is quite 
> doable with a simple subclass of `random.Random`: 
> https://github.com/numpy/numpy/issues/24458#issuecomment-1685022258

Thanks, Robert. The use case is for randomly populating a hypothetical 
N-dimensional sparse array object.
In practice, int64 will probably suffice. 

I can see how to generate arbitrarily large integers using the pattern above, 
but still unsure how to sample without replacement. Aaron mentioned that 
conflicts are extremely unlikely, so perhaps fine to assume they won't happen. 
Checking for conflicts is expensive.

Attached is a script that implements this solution.

Stéfan
import random
import functools
import itertools
import operator

import numpy as np


def cumulative_prod(arr):
return list(itertools.accumulate(arr, func=operator.mul))


def unravel_index(x, dims):
dim_prod = cumulative_prod([1] + list(dims)[:0:-1])[::-1]
return [list((x[j] // dim_prod[i]) % dims[i] for i in range(len(dims))) for j in range(len(x))]


# From Robert Kern's comment at
# https://github.com/numpy/numpy/issues/24458#issuecomment-1685022258
class PythonRandomInterface(random.Random):
def __init__(self, rng):
self._rng = rng

def getrandbits(self, k):
"""getrandbits(k) -> x.  Generates an int with k random bits."""
if k < 0:
raise ValueError('number of bits must be non-negative')
numbytes = (k + 7) // 8   # bits / 8 and rounded up
x = int.from_bytes(self._rng.bytes(numbytes), 'big')
return x >> (numbytes * 8 - k)# trim excess bits

def indices(self, shape, size=1):
D = functools.reduce(lambda x, y: x * y, dims)
indices = [pri.randint(0, D) for i in range(size)]
return unravel_index(indices, shape)


rng = np.random.default_rng()
pri = PythonRandomInterface(rng)

dims = (500, 400, 30, 15, 20, 800, 900, 2000, 800)
k = 5

print(pri.indices(dims, size=k))
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: How to sample unique vectors

2023-11-17 Thread Stefan van der Walt via NumPy-Discussion
On Fri, Nov 17, 2023, at 14:28, Stefan van der Walt wrote:
> Attached is a script that implements this solution.

And the version with set duplicates checking.

Stéfan
import random
import functools
import itertools
import operator

import numpy as np


def cumulative_prod(arr):
return list(itertools.accumulate(arr, func=operator.mul))


def unravel_index(x, dims):
dim_prod = cumulative_prod([1] + list(dims)[:0:-1])[::-1]
return [list((ix // dim_prod[i]) % dims[i] for i in range(len(dims))) for ix in x]


# From Robert Kern's comment at
# https://github.com/numpy/numpy/issues/24458#issuecomment-1685022258
class PythonRandomInterface(random.Random):
def __init__(self, rng):
self._rng = rng

def getrandbits(self, k):
"""getrandbits(k) -> x.  Generates an int with k random bits."""
if k < 0:
raise ValueError('number of bits must be non-negative')
numbytes = (k + 7) // 8   # bits / 8 and rounded up
x = int.from_bytes(self._rng.bytes(numbytes), 'big')
return x >> (numbytes * 8 - k)# trim excess bits

def indices(self, shape, size=1):
D = functools.reduce(lambda x, y: x * y, dims)
indices = set()
while len(indices) < size:
indices.add(pri.randint(0, D))
return unravel_index(indices, shape)


rng = np.random.default_rng()
pri = PythonRandomInterface(rng)

dims = (500, 400, 30, 15, 20, 800, 900, 2000, 800)
k = 5

print(pri.indices(dims, size=k))
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: How to sample unique vectors

2023-11-17 Thread Stefan van der Walt via NumPy-Discussion
On Fri, Nov 17, 2023, at 16:52, Robert Kern wrote:
> That optimistic optimization makes this the fastest solution.

That'd work great, thanks Robert, Aaron, and everyone who shared input. 

Stéfan___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Meson - C extension - Finding numpy includes in virtual env

2023-11-25 Thread Stefan van der Walt via NumPy-Discussion
Hi Doug,

On Sat, Nov 25, 2023, at 07:14, Doug Turnbull wrote:
> Unfortunately the following command fails:
> 
> incdir_numpy = run_command(py,
>   ['-c', 'import numpy; print(numpy.get_include())'],
>   capture: true,
>   check: false,
> ).stdout().strip()

In your repo it says stderr, but the version above (stdout) works for me.

Perhaps you are using a different Python than the one in your virtual env, 
because meson was installed onto your path previously? Try `python -m pip 
install meson` and then invoking the meson binary directly from your 
virtualenv: venv/bin/meson.

Stéfan
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Meson - C extension - Finding numpy includes in virtual env

2023-11-26 Thread Stefan van der Walt via NumPy-Discussion
Hi Doug,

On Sun, Nov 26, 2023, at 06:29, Doug Turnbull wrote:
> To debug, I ran `pip install . --no-build-isolation` it worked (using venv's 
> numpy)

When developing NumPy, we typically build in the existing environment. This is 
done either via `pip install -e .` (which installs hooks to trigger a 
re-compile upon import), or via the spin tool 
(https://github.com/scientific-python/spin), which have meson commands 
pre-bundled:

pip install spin
spin  # lists commands available

Best regards,
Stéfan
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Meson - C extension - Finding numpy includes in virtual env

2023-11-26 Thread Stefan van der Walt via NumPy-Discussion
On Sun, Nov 26, 2023, at 12:03, Nathan wrote:
> For my work I tend to use a persistent build directory with build isolation 
> disabled as discussed in the meson-python docs.

Out of curiosity, how is this different from, e.g., `spin build` which builds 
into `./build-install`?

Stéfan
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Meson - C extension - Finding numpy includes in virtual env

2023-11-28 Thread Stefan van der Walt via NumPy-Discussion
Hi Nathan,

On Tue, Nov 28, 2023, at 08:42, Nathan wrote:
> It looks like `spin build` does `meson build` and `meson install` and doesn't 
> do `pip install`. I'd like numpy to be importable in a python environment of 
> my choosing, so I tend to instead manually install numpy into that 
> environment by invoking pip with something like `python -m pip install . -v 
> --no-build-isolation -Cbuilddir=build -C'compile_args=-v' 
> -C'setup_args=-Dbuildtype=debug'. I like seeing the compile command meson 
> uses, so I pass in `-v` through meson's `compile_args` and I often need a 
> debug build, so I set the build type manually as well. 

That makes sense. We recently added the `spin.pip.install` command for that 
purpose, but of course you don't *need* a command if you know the invocation :)

Stéfan
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: welcome Raghuveer, Chris, Mateusz and Matt to the NumPy maintainers team

2024-01-26 Thread Stefan van der Walt via NumPy-Discussion
On Fri, Jan 26, 2024, at 12:04, Ralf Gommers wrote:
> We've got four new NumPy maintainers! Welcome to the team, and 
> congratulations to:
> 
> - Raghuveer Devulapalli (https://github.com/r-devulap)
> - Chris Sidebottom (https://github.com/mousius)
> - Mateusz Sokół (https://github.com/mtsokol/)
> - Matt Haberland (https://github.com/mdhaber)

Fantastic! Your commit bits are well deserved, and your contributions greatly 
appreciated. 

I look forward to our continued work together. 

Thank you, 
Stéfan 
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: New DType and ArrayMethod C APIs are public

2024-02-14 Thread Stefan van der Walt via NumPy-Discussion
On Wed, Feb 14, 2024, at 14:34, Nathan wrote:
> The DType API publicly exposes the PyArray_DTypeMeta C struct, which 
> represents DType metaclasses. It also exposes a function for registering 
> user-defined DTypes and a set of slot IDs and function typedefs that users 
> can implement in C to write new DTypes.
> 
> The ArrayMethod API allows defining cast and ufunc loops in terms of these 
> new DTypes, in a manner that forbids value-based promotion and abstracts many 
> of the internals of NumPy. We hope the ArrayMethod API is enables sharing 
> low-level loops that work out-of-the-box in NumPy in other projects.

To emphasize what Nathan wrote: this is the culmination of YEARS of work. My 
gratitude goes out to everyone who took part in meetings (both in-person and 
virtual), code development & review, NEP & documentation writing, community 
discussions, etc.: thank you.

I cannot wait to see everything your hard work will enable.

Stéfan
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Improved 2DFFT Approach

2024-03-14 Thread Stefan van der Walt via NumPy-Discussion

Hi Alexander,

On 2024-03-14 22:43:38, Alexander Levin via NumPy-Discussion 
 wrote:
Memory Usage - 
https://github.com/2D-FFT-Project/2d-fft/blob/testnotebook/notebooks/memory_usage.ipynb 
Timing comparisons(updated) - 
https://github.com/2D-FFT-Project/2d-fft/blob/testnotebook/notebooks/comparisons.ipynb


I see these timings are still done only for power-of-two shaped 
arrays. This is the easiest case to optimize, and I wonder if 
you've given further thought to supporting other sizes? PocketFFT, 
e.g., implements the Bluestein / Chirp-Z algorithm to deal with 
cases where the sizes have large prime factors.


Your test matrix also only contains real values. In that case, you 
can use rfft, which might resolve the memory usage difference? I'd 
be surprized if PocketFFT uses that much more memory for the same 
calculation.


I saw that in the notebook code you have:

matr = np.zeros((n, m), dtype=np.complex64)
matr = np.random.rand(n, m)

Was the intent here to generate a complex random matrix?

Stéfan
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Policy on AI-generated code

2024-07-04 Thread Stefan van der Walt via NumPy-Discussion
On Thu, Jul 4, 2024, at 08:18, Daniele Nicolodi wrote:
> I wish it for be common sense for contributors to an open source 
> codebase that they need to own the copyright on their contributions, but 
> I don't think it can be assumed. Adding something to these lines to the 
> project policy has also the potential to educate the contributions about 
> the pitfalls of using AI to autocomplete their contributions.

The ultimate concern is whether GPL code lands in your open source project. 
Will instructions to the author, that they need to make sure they own copyright 
to their code, indemnify the project? I don't think so. You also cannot enforce 
such an instruction. At best, you can, during review, try and establish whether 
the author understands the code they provided; and I hope, where code of any 
complexity is involved, that that should be reasonably obvious.

You'll see we've grappled with this in scikit-image as well: 
https://github.com/scikit-image/scikit-image/pull/7429

Stéfan
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: numpy-user-dtypes updated for NumPy 2.0

2024-07-17 Thread Stefan van der Walt via NumPy-Discussion
Hi Nathan,

On Tue, Jul 16, 2024, at 12:24, Nathan wrote:
> I just pushed some commits to the numpy-user-dtypes repo 
> (https://github.com/numpy/numpy-user-dtypes) that fixes compatibility with 
> the public version of the DType API that shipped in NumPy 2.0. If you’ve been 
> waiting for some runnable examples to look at before trying to write your own 
> DType, wait no more!

Thanks for working on these!

I see some dtypes have build instructions, and some don't. Does it make sense 
to add generic build instructions to the repo README, or otherwise at least 
ensure that each has a well described install & activation mechanism?

Stéfan
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Welcome Joren Hammudoglu to the NumPy Maintainers Team

2024-08-19 Thread Stefan van der Walt via NumPy-Discussion
On Mon, Aug 19, 2024, at 03:00, Sebastian Berg wrote:
> please join me in welcoming Joren (https://github.com/jorenham) to the
> NumPy maintainers team.

Thanks for your contributions, Joren, great to have you on board!

Stéfan
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: What should remain on PyPi

2024-09-03 Thread Stefan van der Walt via NumPy-Discussion
Hi Chuck,

On Tue, Sep 3, 2024, at 08:18, Charles R Harris wrote:
> I just got through deleting a bunch of pre-releases on PyPi and it occurred 
> to me that we should have a policy as to what releases should be kept. I 
> think that reproducibility requires that we keep all the major and micro 
> versions, but if so, we should make that an official guarantee. Perhaps a 
> short NEP? This might even qualify for an SPEC. Thoughts?

That sounds right to me: keep any versions that aren't expressly targeted for 
testing (rc's, beta's, etc.). We still have the GitHub tags for those, should 
developers want to reproduce them.

Stéfan
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Endorsing SPECs 1, 6, 7, and 8

2024-10-08 Thread Stefan van der Walt via NumPy-Discussion
On Mon, Oct 7, 2024, at 06:04, Rohit Goswami wrote:
> I second Matti's comments about the validity of endorsing things we don't 
> implement. 

I don't think it is possible to make ecosystem-wide recommendation that will 
fit each project like a glove. At best, we can try to come together as a 
community, make sound recommendations, and accept that there will be exceptions 
depending on circumstances. And those exceptions may well apply to NumPy. E.g., 
being at the bottom of the stack, the NumPy project may recommend the drop 
schedules from SPEC0 for other projects, but may implement a different strategy 
to ensure wider compatibility.

> Also, personally I really dislike the keys to castle spec, because I'm 
> generally against having yearly check in reviews and such.

The SPECs are living documents, and are constructed based on input from the 
community. It would therefore be good to better understand your concern. Is it 
with the sentence "Review permissions regularly (say, every year) to maintain 
minimal permissions."? Having written that SPEC, to me that obviously feels 
like a fairly pragmatic, low-cost recommendation; but perhaps there are better 
ways to accomplish the same goal. An issue on 
https://github.com/scientific-python/specs or the thread at 
https://discuss.scientific-python.org/t/spec-6-keys-to-the-castle/777/2 could 
be good venues for further discussion.

Best regards,
Stéfan
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Add a bit_width function

2025-03-06 Thread Stefan van der Walt via NumPy-Discussion
Hi Carlos,

On Thu, Mar 6, 2025, at 10:08, Carlos Martin wrote:
> Feature request: Add a `bit_width` function to NumPy's [bit-wise 
> operations](https://numpy.org/doc/stable/reference/routines.bitwise.html) 
> that computes the [bit-width](https://en.wikipedia.org/wiki/Bit-width) 
> (also called bit-length) of an input.

I'm curious, would this be roughly:

def bit_width(x):
y = np.zeros(len(x), dtype=float)
mask = (x != 0)
y[mask] = 1 + np.floor(np.log2(x[mask]))
return y

plus some error checking to ensure you don't run it on unsigned or floating 
point types?

Stéfan
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com