[Numpy-discussion] Re: Canonical way of serialising Generators

2024-10-27 Thread Andrew Nelson via NumPy-Discussion
On Mon, 28 Oct 2024 at 15:06, Andrew Nelson  wrote:

> Hi all,
> is there a canonical way of serialising Generators
>

Specifically I'm interested in a safe way (i.e. no pickle) of
saving/restoring Generator state via HDF5 file storage.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Canonical way of serialising Generators

2024-10-27 Thread Robert Kern via NumPy-Discussion
On Mon, Oct 28, 2024 at 12:11 AM Andrew Nelson via NumPy-Discussion <
numpy-discussion@python.org> wrote:

> Hi all,
> is there a canonical way of serialising Generators (not via pickle).
>

Pickle is the canonical way. In particular, the pickle _machinery_ is going
to be the single source of truth for serialization. But I'll recapitulate
the important details here.

`Generator` itself is a stateless wrapper around the `BitGenerator`, so you
can ignore it and focus on the `BitGenerator` and its `SeedSequence` (at
that, really, only in the case you need to continue spawning).
`BitGenerator.__getstate__()` will give you `(bg_state_dict, seed_seq)` (a
`dict` and the `SeedSequence`). The `BitGenerator` state is going to be a
simple dict, but there will be arbitrary-sized ints and/or numpy arrays in
there. Third-party `BitGenerator`s can do whatever they like, so pickle's
about your only fallback there. For numpy `BitGenerator`s, the name of the
class will be in the key `'bit_generator'`, but for third-party ones,
there's no place for you too look by name.

Notionally, `BitGenerator`s can accept any `ISeedSequence` implementation,
to allow for other kinds of seed massaging algorithms, but we haven't seen
much (any) interest in that, so you can probably safely consider
`SeedSequence` proper. Because Cython does default pickle stuff under the
covers which was sufficient, we don't have an explicit `__getstate__()` for
you to peruse, but you can look at our constructor, which indicates what
can be passed; each one ends up as an attribute with the same name. You'll
want all of them for serialization purposes. Note that `entropy` can be an
arbitrary-sized int or a sequence of arbitrary-sized ints. If a sequence,
it could be a list or a tuple, but this does not need to be preserved; any
sequence type will do on deserialization. `spawn_key` should always be a
tuple of bounded-size ints (each should fit into a uint32, but will be a
plain Python int). `pool_size` is important configuration, though rarely
modified (and should be either 4 or 8, but notionally folks could choose
otherwise). `n_children_spawned` is important state if there's been
spawning and you want to continue spawning later correctly/reproducibly.
But that's it.

So your ultimate serialization needs to handle arbitrary-sized integers,
lists/tuples of integers, and numpy arrays (of various integer dtypes).
JSON with a careful encoder/decoder that can handle those cases would be
fine.

Would the following be reasonable for saving and restoring state:
>

No, you're missing all of the actual state. The `entropy` of the
`SeedSequence` is the original user-input seed, not the current state of
the `BitGenerator`.

> Specifically I'm interested in a safe way (i.e. no pickle) of
saving/restoring Generator state via HDF5 file storage.

By restricting your domain to only numpy-provided `BitGenerator`s and true
`SeedSequence`s, you can do this. In full generality, one cannot.

Untested:

```
def rng_dict(rng):
bg_state = rng.bit_generator.state
ss = rng.bit_generator.seed_seq
ss_dict = dict(entropy=ss.entropy, spawn_key=ss.spawn_key,
pool_size=ss.pool_size, n_children_spawned=ss.n_children_spawned)
return dict(bg_state=bg_state, seed_seq=ss_dict)

def rng_fromdict(d):
bg_state = d['bg_state']
ss = np.random.SeedSequence(**d['seed_seq'])
bg = getattr(np.random, bg_state['bit_generator'])(ss)
bg.state = bg_state
rng = np.random.Generator(bg)
return rng
```

-- 
Robert Kern
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Canonical way of serialising Generators

2024-10-27 Thread Andrew Nelson via NumPy-Discussion
Hi all,
is there a canonical way of serialising Generators (not via pickle). Would
the following be reasonable for saving and restoring state:

```
def serialize_rng(rng):
klass = rng.bit_generator.state['bit_generator']
entropy = rng.bit_generator.seed_seq.entropy
return klass, entropy

def deserialize_rng(klass, entropy, rng=None):
if rng is not None and klass ==
rng.bit_generator.state['bit_generator']:
rng.bit_generator.seed_seq.entropy = entropy
return rng
BG = getattr(np.random, klass)
bg = BG(np.random.SeedSequence(entropy))
return np.random.default_rng(bg)
```
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com