[Numpy-discussion] How to sample unique vectors
Hi all, I am trying to sample k N-dimensional vectors from a uniform distribution without replacement. It seems like this should be straightforward, but I can't seem to pin it down. Specifically, I am trying to get random indices in an d0 x d1 x d2.. x dN-1 array. I thought about sneaking in a structured dtype into `rng.integers`, but of course that doesn't work. If we had a string sampler, I could sample k unique words (consisting of digits), and convert them to indices. I could over-sample and filter out the non-unique indices. Or iteratively draw blocks of samples until I've built up my k unique indices. The most straightforward solution would be to flatten indices, and to sample from those. The integers get large quickly, though. The rng.integers docstring suggests that it can handle object arrays for very large integers: > When using broadcasting with uint64 dtypes, the maximum value (2**64) > cannot be represented as a standard integer type. > The high array (or low if high is None) must have object dtype, e.g., > array([2**64]). But, that doesn't work: In [35]: rng.integers(np.array([2**64], dtype=object)) ValueError: high is out of bounds for int64 Is there an elegant way to handle this problem? Best regards, Stéfan ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: How to sample unique vectors
On Fri, Nov 17, 2023, at 11:07, Robert Kern wrote: > If the arrays you are drawing indices for are real in-memory arrays for > present-day 64-bit computers, this should be adequate. If it's a notional > array that is larger, then you'll need actual arbitrary-sized integer > sampling. The builtin `random.randrange()` will do arbitrary-sized integers > and is quite reasonable for this task. If you want it to use our > BitGenerators underneath for clean PRNG state management, this is quite > doable with a simple subclass of `random.Random`: > https://github.com/numpy/numpy/issues/24458#issuecomment-1685022258 Thanks, Robert. The use case is for randomly populating a hypothetical N-dimensional sparse array object. In practice, int64 will probably suffice. I can see how to generate arbitrarily large integers using the pattern above, but still unsure how to sample without replacement. Aaron mentioned that conflicts are extremely unlikely, so perhaps fine to assume they won't happen. Checking for conflicts is expensive. Attached is a script that implements this solution. Stéfan import random import functools import itertools import operator import numpy as np def cumulative_prod(arr): return list(itertools.accumulate(arr, func=operator.mul)) def unravel_index(x, dims): dim_prod = cumulative_prod([1] + list(dims)[:0:-1])[::-1] return [list((x[j] // dim_prod[i]) % dims[i] for i in range(len(dims))) for j in range(len(x))] # From Robert Kern's comment at # https://github.com/numpy/numpy/issues/24458#issuecomment-1685022258 class PythonRandomInterface(random.Random): def __init__(self, rng): self._rng = rng def getrandbits(self, k): """getrandbits(k) -> x. Generates an int with k random bits.""" if k < 0: raise ValueError('number of bits must be non-negative') numbytes = (k + 7) // 8 # bits / 8 and rounded up x = int.from_bytes(self._rng.bytes(numbytes), 'big') return x >> (numbytes * 8 - k)# trim excess bits def indices(self, shape, size=1): D = functools.reduce(lambda x, y: x * y, dims) indices = [pri.randint(0, D) for i in range(size)] return unravel_index(indices, shape) rng = np.random.default_rng() pri = PythonRandomInterface(rng) dims = (500, 400, 30, 15, 20, 800, 900, 2000, 800) k = 5 print(pri.indices(dims, size=k)) ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: How to sample unique vectors
On Fri, Nov 17, 2023, at 14:28, Stefan van der Walt wrote: > Attached is a script that implements this solution. And the version with set duplicates checking. Stéfan import random import functools import itertools import operator import numpy as np def cumulative_prod(arr): return list(itertools.accumulate(arr, func=operator.mul)) def unravel_index(x, dims): dim_prod = cumulative_prod([1] + list(dims)[:0:-1])[::-1] return [list((ix // dim_prod[i]) % dims[i] for i in range(len(dims))) for ix in x] # From Robert Kern's comment at # https://github.com/numpy/numpy/issues/24458#issuecomment-1685022258 class PythonRandomInterface(random.Random): def __init__(self, rng): self._rng = rng def getrandbits(self, k): """getrandbits(k) -> x. Generates an int with k random bits.""" if k < 0: raise ValueError('number of bits must be non-negative') numbytes = (k + 7) // 8 # bits / 8 and rounded up x = int.from_bytes(self._rng.bytes(numbytes), 'big') return x >> (numbytes * 8 - k)# trim excess bits def indices(self, shape, size=1): D = functools.reduce(lambda x, y: x * y, dims) indices = set() while len(indices) < size: indices.add(pri.randint(0, D)) return unravel_index(indices, shape) rng = np.random.default_rng() pri = PythonRandomInterface(rng) dims = (500, 400, 30, 15, 20, 800, 900, 2000, 800) k = 5 print(pri.indices(dims, size=k)) ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: How to sample unique vectors
On Fri, Nov 17, 2023, at 16:52, Robert Kern wrote: > That optimistic optimization makes this the fastest solution. That'd work great, thanks Robert, Aaron, and everyone who shared input. Stéfan___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Meson - C extension - Finding numpy includes in virtual env
Hi Doug, On Sat, Nov 25, 2023, at 07:14, Doug Turnbull wrote: > Unfortunately the following command fails: > > incdir_numpy = run_command(py, > ['-c', 'import numpy; print(numpy.get_include())'], > capture: true, > check: false, > ).stdout().strip() In your repo it says stderr, but the version above (stdout) works for me. Perhaps you are using a different Python than the one in your virtual env, because meson was installed onto your path previously? Try `python -m pip install meson` and then invoking the meson binary directly from your virtualenv: venv/bin/meson. Stéfan ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Meson - C extension - Finding numpy includes in virtual env
Hi Doug, On Sun, Nov 26, 2023, at 06:29, Doug Turnbull wrote: > To debug, I ran `pip install . --no-build-isolation` it worked (using venv's > numpy) When developing NumPy, we typically build in the existing environment. This is done either via `pip install -e .` (which installs hooks to trigger a re-compile upon import), or via the spin tool (https://github.com/scientific-python/spin), which have meson commands pre-bundled: pip install spin spin # lists commands available Best regards, Stéfan ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Meson - C extension - Finding numpy includes in virtual env
On Sun, Nov 26, 2023, at 12:03, Nathan wrote: > For my work I tend to use a persistent build directory with build isolation > disabled as discussed in the meson-python docs. Out of curiosity, how is this different from, e.g., `spin build` which builds into `./build-install`? Stéfan ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Meson - C extension - Finding numpy includes in virtual env
Hi Nathan, On Tue, Nov 28, 2023, at 08:42, Nathan wrote: > It looks like `spin build` does `meson build` and `meson install` and doesn't > do `pip install`. I'd like numpy to be importable in a python environment of > my choosing, so I tend to instead manually install numpy into that > environment by invoking pip with something like `python -m pip install . -v > --no-build-isolation -Cbuilddir=build -C'compile_args=-v' > -C'setup_args=-Dbuildtype=debug'. I like seeing the compile command meson > uses, so I pass in `-v` through meson's `compile_args` and I often need a > debug build, so I set the build type manually as well. That makes sense. We recently added the `spin.pip.install` command for that purpose, but of course you don't *need* a command if you know the invocation :) Stéfan ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: welcome Raghuveer, Chris, Mateusz and Matt to the NumPy maintainers team
On Fri, Jan 26, 2024, at 12:04, Ralf Gommers wrote: > We've got four new NumPy maintainers! Welcome to the team, and > congratulations to: > > - Raghuveer Devulapalli (https://github.com/r-devulap) > - Chris Sidebottom (https://github.com/mousius) > - Mateusz Sokół (https://github.com/mtsokol/) > - Matt Haberland (https://github.com/mdhaber) Fantastic! Your commit bits are well deserved, and your contributions greatly appreciated. I look forward to our continued work together. Thank you, Stéfan ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: New DType and ArrayMethod C APIs are public
On Wed, Feb 14, 2024, at 14:34, Nathan wrote: > The DType API publicly exposes the PyArray_DTypeMeta C struct, which > represents DType metaclasses. It also exposes a function for registering > user-defined DTypes and a set of slot IDs and function typedefs that users > can implement in C to write new DTypes. > > The ArrayMethod API allows defining cast and ufunc loops in terms of these > new DTypes, in a manner that forbids value-based promotion and abstracts many > of the internals of NumPy. We hope the ArrayMethod API is enables sharing > low-level loops that work out-of-the-box in NumPy in other projects. To emphasize what Nathan wrote: this is the culmination of YEARS of work. My gratitude goes out to everyone who took part in meetings (both in-person and virtual), code development & review, NEP & documentation writing, community discussions, etc.: thank you. I cannot wait to see everything your hard work will enable. Stéfan ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Improved 2DFFT Approach
Hi Alexander, On 2024-03-14 22:43:38, Alexander Levin via NumPy-Discussion wrote: Memory Usage - https://github.com/2D-FFT-Project/2d-fft/blob/testnotebook/notebooks/memory_usage.ipynb Timing comparisons(updated) - https://github.com/2D-FFT-Project/2d-fft/blob/testnotebook/notebooks/comparisons.ipynb I see these timings are still done only for power-of-two shaped arrays. This is the easiest case to optimize, and I wonder if you've given further thought to supporting other sizes? PocketFFT, e.g., implements the Bluestein / Chirp-Z algorithm to deal with cases where the sizes have large prime factors. Your test matrix also only contains real values. In that case, you can use rfft, which might resolve the memory usage difference? I'd be surprized if PocketFFT uses that much more memory for the same calculation. I saw that in the notebook code you have: matr = np.zeros((n, m), dtype=np.complex64) matr = np.random.rand(n, m) Was the intent here to generate a complex random matrix? Stéfan ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Policy on AI-generated code
On Thu, Jul 4, 2024, at 08:18, Daniele Nicolodi wrote: > I wish it for be common sense for contributors to an open source > codebase that they need to own the copyright on their contributions, but > I don't think it can be assumed. Adding something to these lines to the > project policy has also the potential to educate the contributions about > the pitfalls of using AI to autocomplete their contributions. The ultimate concern is whether GPL code lands in your open source project. Will instructions to the author, that they need to make sure they own copyright to their code, indemnify the project? I don't think so. You also cannot enforce such an instruction. At best, you can, during review, try and establish whether the author understands the code they provided; and I hope, where code of any complexity is involved, that that should be reasonably obvious. You'll see we've grappled with this in scikit-image as well: https://github.com/scikit-image/scikit-image/pull/7429 Stéfan ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: numpy-user-dtypes updated for NumPy 2.0
Hi Nathan, On Tue, Jul 16, 2024, at 12:24, Nathan wrote: > I just pushed some commits to the numpy-user-dtypes repo > (https://github.com/numpy/numpy-user-dtypes) that fixes compatibility with > the public version of the DType API that shipped in NumPy 2.0. If you’ve been > waiting for some runnable examples to look at before trying to write your own > DType, wait no more! Thanks for working on these! I see some dtypes have build instructions, and some don't. Does it make sense to add generic build instructions to the repo README, or otherwise at least ensure that each has a well described install & activation mechanism? Stéfan ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Welcome Joren Hammudoglu to the NumPy Maintainers Team
On Mon, Aug 19, 2024, at 03:00, Sebastian Berg wrote: > please join me in welcoming Joren (https://github.com/jorenham) to the > NumPy maintainers team. Thanks for your contributions, Joren, great to have you on board! Stéfan ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: What should remain on PyPi
Hi Chuck, On Tue, Sep 3, 2024, at 08:18, Charles R Harris wrote: > I just got through deleting a bunch of pre-releases on PyPi and it occurred > to me that we should have a policy as to what releases should be kept. I > think that reproducibility requires that we keep all the major and micro > versions, but if so, we should make that an official guarantee. Perhaps a > short NEP? This might even qualify for an SPEC. Thoughts? That sounds right to me: keep any versions that aren't expressly targeted for testing (rc's, beta's, etc.). We still have the GitHub tags for those, should developers want to reproduce them. Stéfan ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Endorsing SPECs 1, 6, 7, and 8
On Mon, Oct 7, 2024, at 06:04, Rohit Goswami wrote: > I second Matti's comments about the validity of endorsing things we don't > implement. I don't think it is possible to make ecosystem-wide recommendation that will fit each project like a glove. At best, we can try to come together as a community, make sound recommendations, and accept that there will be exceptions depending on circumstances. And those exceptions may well apply to NumPy. E.g., being at the bottom of the stack, the NumPy project may recommend the drop schedules from SPEC0 for other projects, but may implement a different strategy to ensure wider compatibility. > Also, personally I really dislike the keys to castle spec, because I'm > generally against having yearly check in reviews and such. The SPECs are living documents, and are constructed based on input from the community. It would therefore be good to better understand your concern. Is it with the sentence "Review permissions regularly (say, every year) to maintain minimal permissions."? Having written that SPEC, to me that obviously feels like a fairly pragmatic, low-cost recommendation; but perhaps there are better ways to accomplish the same goal. An issue on https://github.com/scientific-python/specs or the thread at https://discuss.scientific-python.org/t/spec-6-keys-to-the-castle/777/2 could be good venues for further discussion. Best regards, Stéfan ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Add a bit_width function
Hi Carlos, On Thu, Mar 6, 2025, at 10:08, Carlos Martin wrote: > Feature request: Add a `bit_width` function to NumPy's [bit-wise > operations](https://numpy.org/doc/stable/reference/routines.bitwise.html) > that computes the [bit-width](https://en.wikipedia.org/wiki/Bit-width) > (also called bit-length) of an input. I'm curious, would this be roughly: def bit_width(x): y = np.zeros(len(x), dtype=float) mask = (x != 0) y[mask] = 1 + np.floor(np.log2(x[mask])) return y plus some error checking to ensure you don't run it on unsigned or floating point types? Stéfan ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com