Date: Wed, 17 Dec 2025 11:50:21 -0800
From: Josiah Parry<[email protected]>

I wanted to write to understand what limitations there may be with making
set operations in base S3 generic functions. Are there any technical
limitations as to why this wouldn't be possible?


The set ops {intersect, union, setdiff, setequal} and %in% and %notin% are all
generic-like by virtue of composing generic functions for vector-like classes.
If you have a vector-like class and you define (as needed) methods for '[',
'c', 'mtfrm', 'names<-', and 'unique', then the set ops work automatically and
correctly.  The built-in classes 'Date', 'POSIXct', 'POSIXlt', 'difftime', and
'factor' provide a good model here.

S3 generic set ops would only really support those non-vector-like classes for
which set ops happen to have a meaningful definition: 'nb' is a good example,
but are there many others?

A benefit of having a minimal set of generic functions in base (and composing
them to form a larger set of generic-like functions) is that it limits growth
of the base namespace.  Every new generic function base::generic requires a
corresponding default method base::generic.default.

In writing a reply in R-Sig-Geo (1)  today, I was reminded that `spdep`'s
set operations are not exported S3 methods—e.g. must use
spdep::union.nb()—because there is no generic declared in `base`.

I think the R ecosystem would benefit greatly from generics declared in
base for these methods. For example, the `generics` (2) package was
published in 2018 including S3 generics for set operations masking base.
`generics` has 189 reverse imports, I suspect quite a few of them are for
set operations.

Generics GitHub usage (duplicates ofc from forks)

- 353 results for importFrom(generics, union) (3)
- 361 results for importFrom(generics, intersect) (4)
- 355 results for importFrom(generics,setdiff) (5)

There are also a number of manual implementations of an S3 generic for set
ops that mask base. See the following search GitHub results

- 249 results for UseMethod("union") (6)
- 208 results for UseMethod("intersect") (7)
- 199 results for UseMethod("setdiff") (8)


My guess is that in most of these examples masking the base set ops would not
be necessary if some vector-like class were implemented more rigorously, i.e.,
with methods for '[', 'c', etc.

Mikael


references :
1.https://stat.ethz.ch/pipermail/r-sig-geo/2025-December/029582.html
2.https://cran.r-project.org/src/contrib/Archive/generics
3.https://github.com/search?q=importFrom%28generics%2Cunion%29+&type=code
4.
https://github.com/search?q=importFrom%28generics%2Cintersect%29+&type=code
5.https://github.com/search?q=importFrom%28generics%2Csetdiff%29+&type=code
6.
https://github.com/search?q=UseMethod%28%22union%22%29+language%3AR&type=code
7.
https://github.com/search?q=UseMethod%28%22intersect%22%29+language%3AR&type=code
8.
https://github.com/search?q=UseMethod%28%22setdiff%22%29+language%3AR&type=code

        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to