Re: [Rd] An iteration protocol

Tomasz Kalinowski Tue, 12 Aug 2025 06:59:35 -0700

Thank you Lionel, Peter, and Duncan!
Some responses inline below:

> Couldn't this all be done in a while or repeat loop? ...
> Not as simple as yours, but I think a little clearer because it's more 
> concrete, less abstract.


Indeed, that’s the trade-off! Explicit and verbose vs. simple,
concise, and abstracted away. There are certainly times when I prefer
the former, but the latter is not even an option today. Particularly
in a teaching context, I think the concept of iteration is more
intuitive and faster to teach than the precise mechanics of iteration.
The opportunity to make `for` usable with a broader set of object
types is icing on the cake. (Some of these arguments are fleshed out
further in the README linked in the first email.)

> It's not clear to me how the for() loop chooses a value to pass to the 
> iterator function.

In the draft patch, `for` creates a unique sentinel object, a bare
`OBJSXP`. The iterator closure is called with this sentinel as the
argument, and the closure must return exactly it to indicate
exhaustion.

This approach neatly achieves a few design goals. It introduces no
persistent symbols, keeping the API surface small, and avoids
introducing the ugly edge case of a potential false-positive
exhaustion detection. It has less overhead than a signal. Compared to
a signal, it should also encourage a more local coding style, making
code easier to reason about. Treating errors as values is one idea
that Rust has proven the value of to me, and this value-sentinel
approach is a close cousin of that.

The example `SampleSequence` iterator in the initial email had a
default sentinel value of `NULL`. This was to allow convenient manual
iteration with something like:

```r
it <- SampleSequence(9)
it(); it(); it(); ...
```

Or, if you prefer a more explicit approach:

```r
it <- SampleSequence(9)
repeat { val <- it() %||% break; ... }
```

Or:

```r
repeat { val <- it(break); ... }
```

Or:

```r
while (!is.null(val <- it())) { ... }
```

Or, for maximum robustness:

```r
done_sentinel <- new.env(parent = emptyenv())
while (!identical(done_sentinel, val <- it(done_sentinel))) { ... }
```

This enables a variety of usage patterns with different trade-offs
between convenience and robustness, with `for` able to take the most
robust approach, while allowing the iterator’s default sentinel to
prioritize convenience.

> It's very useful to *close* iterators for resource cleanup.

This is interesting and, to be honest, not a use case we had considered.

Would using `reg.finalizer()` be sufficient for your use case? It
gives less control over timing than `on.exit()`, but can close
resources with something like:

```r
Stream <- function() {
  r <- open_resource()
  reg.finalizer(environment(), \(e) r$close())
  \(done) r$get_next() %||% done
}
```

On Tue, Aug 12, 2025 at 5:20 AM Lionel Henry <lio...@posit.co> wrote:
>
> Clever! If going for non-local returns, probably best for ergonomics to pass 
> in
> a closure (see e.g. `callCC()`). If only to avoid accidental jumps while
> debugging.
>
> But... do we need more lazy evaluation tricks in the language or fewer? It's
> probably more idiomatic to express non-local returns with condition signals
> like `stopIteration()`.
>
> There's something to be said for explicit and simple control flow though, via
> handling of returned values.
>
>
> > Note that it is trivial to create a unique sentinel value -- any newly
> > created closure (i.e. function() NULL) will do, as it will only
> > compare identical() with itself.
>
> Until you try that in the global env right? Then the risk of collision 
> slightly
> increases. Unless you make your closure more unique via `body()`, but then 
> might
> as well use a conventional sentinel.
>
> Best,
> Lionel
>
> On Tue, Aug 12, 2025 at 1:45 AM Peter Meilstrup
> <peter.meilst...@gmail.com> wrote:
> >
> > Passing the sentinel value as an argument to the iteration method is
> > the approach taken in my package `iterors` on CRAN. If the sentinel
> > value argument is evaluated lazily, this lets you pass calls to things
> > like 'stop', 'break' or 'return,' which will be called to signal end
> > of iteration. This makes for some nice compact and performant
> > iteration idioms:
> >
> > iter <- as.iteror(obj)
> > total <- 0
> > repeat {total <- total + nextOr(iter, break)}
> >
> > Note that iteror is just a closure with one optional argument and a
> > class attribute, so you can skip using s3 nextOr method and call it
> > directly:
> >
> > nextElem <- as.iteror(obj)
> > repeat {total <- total + nextElem(break)}
> >
> > For backward compatibility with the iterators package, the default
> > sentinel value for iterors is `stop("StopIteration")`.
> >
> > Note that it is trivial to create a unique sentinel value -- any newly
> > created closure (i.e. function() NULL) will do, as it will only
> > compare identical() with itself.
> >
> > sigil <- \() NULL
> > next <- as.iteror(obj)
> > while (!identical(item <-next(sigil), sigil)) {
> >   doStuff(item)
> > }
> >
> > Peter Meilstrup
> >
> > On Mon, Aug 11, 2025 at 5:56 PM Lionel Henry via R-devel
> > <r-devel@r-project.org> wrote:
> > >
> > > Hello,
> > >
> > > A couple of comments:
> > >
> > > - Regarding the closure + sentinel approach, also implemented in coro
> > >   (https://github.com/r-lib/coro/blob/main/R/iterator.R), it's more
> > > robust for the
> > >   sentinel to always be a temporary value. If you store the sentinel
> > > in a list or
> > >   a namespace, it might inadvertently close iterators when iterating over 
> > > that
> > >   collection. That's why the coro sentinel is created with 
> > > `coro::exhausted()`
> > >   rather than exported from the namespace as a constant object. The 
> > > sentinel can
> > >   be equivalently created with `as.symbol(".__exhausted__.")`, the main 
> > > thing to
> > >   ensure robustness is to avoid storing it and always create it from 
> > > scratch.
> > >
> > >   The approach of passing the sentinel by argument (which I see in the 
> > > example
> > >   in your mail but not in the linked documentation of approach 3) also
> > > works if the
> > >   iterator loop passes a unique sentinel. Having a default of `NULL` 
> > > makes it
> > >   likely to get unexpected exhaustion of iterators when a sentinel is not 
> > > passed
> > >   in though.
> > >
> > > - It's very useful to _close_ iterators for resource cleanup. It's the
> > > responsibility of an iterator loop (e.g. `for` but could be other custom 
> > > tools
> > > invoking the iterator) to close them. See 
> > > https://github.com/r-lib/coro/pull/58
> > > for an interesting application of iterator closing, allowing robust 
> > > support of
> > > `on.exit()` expressions in coro generators.
> > >
> > >   To implement iterator closing with the closure approach, an iterator may
> > >   optionally take a `close` argument. A `true` value is passed on exit,
> > >   instructing the iterator to clean up resources.
> > >
> > > Best,
> > > Lionel
> > >
> > > On Mon, Aug 11, 2025 at 3:24 PM Tomasz Kalinowski <kalinows...@gmail.com> 
> > > wrote:
> > > >
> > > > Hi all,
> > > >
> > > > A while back, Hadley and I explored what an iteration protocol for R
> > > > might look like. We worked through motivations, design choices, and edge
> > > > cases, which we documented here:
> > > > https://github.com/t-kalinowski/r-iterator-ideas
> > > >
> > > > At the end of this process, I put together a patch to R (with tests) and
> > > > would like to invite feedback from R Core and the broader community:
> > > > https://github.com/r-devel/r-svn/pull/130/files?diff=unified&w=1
> > > >
> > > > In summary, the overall design is a minimal patch. It introduces no
> > > > breaking changes and essentially no new overhead. There are two parts.
> > > >
> > > > 1.  Add a new `as.iterable()` S3 generic, with a default identity
> > > >     method. This provides a user-extensible mechanism for selectively
> > > >     changing the iteration behavior for some object types passed to
> > > >     `for`. `as.iterable()` methods are expected to return anything that
> > > >     `for` can handle directly, namely, vectors or pairlists, or (new) a
> > > >     closure.
> > > >
> > > > 2.  `for` gains the ability to accept a closure for the iterable
> > > >     argument. A closure is called repeatedly for each loop iteration
> > > >     until the closure returns an `exhausted` sentinel value, which it
> > > >     received as an input argument.
> > > >
> > > > Here is a small example of using the iteration protocol to implement a
> > > > sequence of random samples:
> > > >
> > > > ``` r
> > > > SampleSequence <- function(n) {
> > > >   i <- 0
> > > >   function(done = NULL) {
> > > >     if (i >= n) {
> > > >       return(done)
> > > >     }
> > > >     i <<- i + 1
> > > >     runif(1)
> > > >   }
> > > > }
> > > >
> > > > for(sample in SampleSequence(2)) {
> > > >   print(sample)
> > > > }
> > > >
> > > > # [1] 0.7677586
> > > > # [1] 0.355592
> > > > ```
> > > >
> > > > Best,
> > > > Tomasz
> > > >
> > > > ______________________________________________
> > > > R-devel@r-project.org mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > >
> > > ______________________________________________
> > > R-devel@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] An iteration protocol

Reply via email to