[Rd] Small typo in Writing R Extensions
Hi all, Writing R Extensions describes `R_NewEnv()` as: ``` At times it may also be useful to create a new environment frame in C code. R_NewEnv is a C version of the R function new.env: SEXP R_NewEnv(SEXP enclos, int hash, ins size) ``` There is a typo here where `ins size` should be `int size`. Thanks! Davis __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] On optimizing `R_NewEnv()`
Hi all, I really like the addition of `R_NewEnv()` back in 4.1.0 https://github.com/wch/r-source/blob/625ab8d45f86f65561e53627e1f0c220bdc5f752/src/main/envir.c#L3619-L3630 I have a use case where I'm likely to call this function a large number of times to generate many small hashed environments, so I'd like to optimize it as far as possible. I noticed that it takes `int size`, converts that to a SEXP for `R_NewHashedEnv()`, which then simply converts that back to an `int` here: https://github.com/wch/r-source/blob/625ab8d45f86f65561e53627e1f0c220bdc5f752/src/main/envir.c#L378 I wonder if we could cut out that intermediate SEXP (along with its protection) by adjusting `R_NewHashedEnv()` to instead take `int size`. I'd be happy to do a patch if that sounds good. I'd update all uses of `R_NewHashedEnv()` to supply `int`s instead, which actually seems like it would make every instance of calling that function simpler: https://github.com/search?q=repo%3Awch%2Fr-source%20R_NewHashedEnv&type=code So hopefully a win everywhere? Thanks, Davis __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] An alternative algorithm for `which()`
Hi all, I've sent in a bugzilla patch for an alternative C algorithm for `which()` which uses less memory and is often faster in many real life scenarios. I've documented it in full on the bugzilla page, with many examples: https://bugs.r-project.org/show_bug.cgi?id=18495 The short version is that the performance comes from making the loops branchless, which seems to be particularly helpful for `which()`. With `which(x)`, I'd argue that branches are often hard for the compiler to predict since in most real data there is typically no indication that if the i-th element of `x` is `TRUE`, then the i+1-th element might also be `TRUE`. I've received a few comments on the bugzilla page, but I'd love it if anyone else could chime in and provide their own thoughts! Thanks, Davis Vaughan __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Possible inconsistency between `as.complex(NA_real_)` and the docs
Hi all, Surprisingly (at least to me), `as.complex(NA_real_)` results in `complex(real = NA_real_, imaginary = 0)` rather than `NA_complex_`. It seems to me that this goes against the docs of `as.complex()`, which say this in the Details section: "Up to R versions 3.2.x, all forms of NA and NaN were coerced to a complex NA, i.e., the NA_complex_ constant, for which both the real and imaginary parts are NA. Since R 3.3.0, typically only objects which are NA in parts are coerced to complex NA, but others with NaN parts, are not. As a consequence, complex arithmetic where only NaN's (but no NA's) are involved typically will not give complex NA but complex numbers with real or imaginary parts of NaN." To me this suggests that `NA_real_`, which is "NA in parts", should have been coerced to "complex NA". `NA_integer_` is actually coerced to `NA_complex_`, which to me is further evidence that `NA_real_` should have been as well. Here is the original commit where this behavior was changed in R 3.3.0: https://github.com/wch/r-source/commit/4a4c2052e5a541981a249d4fcf92b54ca7f0a2df ``` # This is expected, based on the docs x <- as.complex(NaN) Re(x) #> [1] NaN Im(x) #> [1] 0 # This is not expected. The docs say: # "Since R 3.3.0, typically only objects which are NA in parts are coerced to complex NA" # but that doesn't seem true for `NA_real_`. x <- as.complex(NA_real_) Re(x) #> [1] NA Im(x) #> [1] 0 # It does seem to be the case for `NA_integer_` x <- as.complex(NA_integer_) Re(x) #> [1] NA Im(x) #> [1] NA ``` Thanks, Davis Vaughan __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] range() for Date and POSIXct could respect `finite = TRUE`
Hi all, I noticed that `range.default()` has a nice `finite = TRUE` argument, but it doesn't actually apply to Date or POSIXct due to how `is.numeric()` works. ``` x <- .Date(c(0, Inf, 1, 2, Inf)) x #> [1] "1970-01-01" "Inf""1970-01-02" "1970-01-03" "Inf" # Darn! range(x, finite = TRUE) #> [1] "1970-01-01" "Inf" # What I want .Date(range(unclass(x), finite = TRUE)) #> [1] "1970-01-01" "1970-01-03" ``` I think `finite = TRUE` would be pretty nice for Dates in particular. As a motivating example, sometimes you have ranges of dates represented by start/end pairs. It is fairly natural to represent an event that hasn't ended yet with an infinite date. If you need to then compute a sequence of dates spanning the full range of the start/end pairs, it would be nice to be able to use `range(finite = TRUE)` to do so: ``` start <- as.Date(c("2019-01-05", "2019-01-10", "2019-01-11", "2019-01-14")) end <- as.Date(c("2019-01-07", NA, "2019-01-14", NA)) end[is.na(end)] <- Inf # `end = Inf` means that the event hasn't "ended" yet data.frame(start, end) #>startend #> 1 2019-01-05 2019-01-07 #> 2 2019-01-10Inf #> 3 2019-01-11 2019-01-14 #> 4 2019-01-14Inf # Create a full sequence along all days in start/end range <- .Date(range(unclass(c(start, end)), finite = TRUE)) seq(range[1], range[2], by = 1) #> [1] "2019-01-05" "2019-01-06" "2019-01-07" "2019-01-08" "2019-01-09" #> [6] "2019-01-10" "2019-01-11" "2019-01-12" "2019-01-13" "2019-01-14" ``` It seems like one option is to create a `range.Date()` method that unclasses, forwards the arguments on to a second call to `range()`, and then reclasses? ``` range.Date <- function(x, ..., na.rm = FALSE, finite = FALSE) { .Date(range(unclass(x), na.rm = na.rm, finite = finite), oldClass(x)) } ``` This is similar to how `rep.Date()` works. Thanks, Davis Vaughan __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] range() for Date and POSIXct could respect `finite = TRUE`
Martin, Yes, I missed that those have `Summary.*` methods, thanks! Tweaking those to respect `finite = TRUE` sounds great. It seems like it might be a little tricky since the Summary methods call `NextMethod()`, and `range.default()` uses `is.numeric()` to determine whether or not to apply `finite`. Because `is.numeric.Date()` is defined, that always returns `FALSE` for Dates (and POSIXt). Because of that, it may still be easier to just write a specific `range.Date()` method, but I'm not sure. -Davis On Sat, Apr 29, 2023 at 4:47 PM Martin Maechler wrote: > > >>>>> Davis Vaughan via R-devel > >>>>> on Fri, 28 Apr 2023 11:12:27 -0400 writes: > > > Hi all, > > > I noticed that `range.default()` has a nice `finite = > > TRUE` argument, but it doesn't actually apply to Date or > > POSIXct due to how `is.numeric()` works. > > Well, I think it would / should never apply: > > range() belongs to the "Summary" group generics (as min, max, ...) > > and there *are* Summary.Date() and Summary.POSIX{c,l}t() methods. > > Without checking further for now, I think you are indirectly > suggesting to enhance these three Summary.*() methods so they do > obey 'finite = TRUE' . > > I think I agree they should. > > Martin > > > ``` x <- .Date(c(0, Inf, 1, 2, Inf)) x #> [1] "1970-01-01" > > "Inf" "1970-01-02" "1970-01-03" "Inf" > > > # Darn! range(x, finite = TRUE) #> [1] "1970-01-01" "Inf" > > > # What I want .Date(range(unclass(x), finite = TRUE)) #> > > [1] "1970-01-01" "1970-01-03" ``` > > > I think `finite = TRUE` would be pretty nice for Dates in > > particular. > > > As a motivating example, sometimes you have ranges of > > dates represented by start/end pairs. It is fairly natural > > to represent an event that hasn't ended yet with an > > infinite date. If you need to then compute a sequence of > > dates spanning the full range of the start/end pairs, it > > would be nice to be able to use `range(finite = TRUE)` to > > do so: > > > ``` start <- as.Date(c("2019-01-05", "2019-01-10", > > "2019-01-11", "2019-01-14")) end <- > > as.Date(c("2019-01-07", NA, "2019-01-14", NA)) > > end[is.na(end)] <- Inf > > > # `end = Inf` means that the event hasn't "ended" yet > > data.frame(start, end) #> start end #> 1 2019-01-05 > > 2019-01-07 #> 2 2019-01-10 Inf #> 3 2019-01-11 2019-01-14 > > #> 4 2019-01-14 Inf > > > # Create a full sequence along all days in start/end range > > <- .Date(range(unclass(c(start, end)), finite = TRUE)) > > seq(range[1], range[2], by = 1) #> [1] "2019-01-05" > > "2019-01-06" "2019-01-07" "2019-01-08" "2019-01-09" #> [6] > > "2019-01-10" "2019-01-11" "2019-01-12" "2019-01-13" > > "2019-01-14" ``` > > > It seems like one option is to create a `range.Date()` > > method that unclasses, forwards the arguments on to a > > second call to `range()`, and then reclasses? > > > ``` range.Date <- function(x, ..., na.rm = FALSE, finite = > > FALSE) { .Date(range(unclass(x), na.rm = na.rm, finite = > > finite), oldClass(x)) } ``` > > > This is similar to how `rep.Date()` works. > > > Thanks, Davis Vaughan > > > __ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] range() for Date and POSIXct could respect `finite = TRUE`
It seems like the main problem is that `is.numeric(x)` isn't fully indicative of whether or not `is.finite(x)` makes sense for `x` (i.e. Date isn't numeric but does allow infinite dates). So I could also imagine a new `allows.infinite()` S3 generic that would return a single TRUE/FALSE for whether or not the type allows infinite values, this would also be indicative of whether or not `is.finite()` and `is.infinite()` make sense on that type. I imagine it being used like: ``` allows.infinite <- function(x) { UseMethod("allows.infinite") } allows.infinite.default <- function(x) { is.numeric(x) # For backwards compatibility, maybe? Not sure. } allows.infinite.Date <- function(x) { TRUE } allows.infinite.POSIXct <- function(x) { TRUE } range.default <- function (..., na.rm = FALSE, finite = FALSE) { x <- c(..., recursive = TRUE) if (allows.infinite(x)) { # changed from `is.numeric()` if (finite) x <- x[is.finite(x)] else if (na.rm) x <- x[!is.na(x)] c(min(x), max(x)) } else { if (finite) na.rm <- TRUE c(min(x, na.rm = na.rm), max(x, na.rm = na.rm)) } } ``` It could allow other R developers to also use the pattern of: ``` if (allows.infinite(x)) { # conditionally do stuff with is.infinite(x) } ``` and that seems like it could be rather nice. It would avoid the need for `range.Date()` and `range.POSIXct()` methods too. -Davis On Thu, May 4, 2023 at 5:29 AM Martin Maechler wrote: > > >>>>> Davis Vaughan > >>>>> on Mon, 1 May 2023 08:46:33 -0400 writes: > > > Martin, > > Yes, I missed that those have `Summary.*` methods, thanks! > > > Tweaking those to respect `finite = TRUE` sounds great. It seems like > > it might be a little tricky since the Summary methods call > > `NextMethod()`, and `range.default()` uses `is.numeric()` to determine > > whether or not to apply `finite`. Because `is.numeric.Date()` is > > defined, that always returns `FALSE` for Dates (and POSIXt). Because > > of that, it may still be easier to just write a specific > > `range.Date()` method, but I'm not sure. > > > -Davis > > I've looked more closely now, and indeed, > range() is the only function in the Summary group > where (only) the default method has a 'finite' argument. > which strikes me as somewhat asymmetric / inconsequential, as > after all, range(.) := c(min(.), max(.)) , > but min() and max() do not obey an finite=TRUE setting, note > > > min(c(-Inf,3:5), finite=TRUE) > Error: attempt to use zero-length variable name > > where the error message also is not particularly friendly > and of course has nothing to with 'finite' : > > > max(1:4, foo="bar") > Error: attempt to use zero-length variable name > > > > ... but that is diverting; coming back to the topic: Given > that 'finite' only applies to range() {and there is just a convenience}, > I do agree that from my own work & support to make `Date` and > `POSIX(c)t` behave more number-like, it would be "nice" to have > range() obey a `finite=TRUE` also for these. > > OTOH, there are quite a few other 'number-like' thingies for > which I would then like to have range(*, finite=TRUE) work, > e.g., "mpfr" (package {Rmpfr}) or "bigz" {gmp} numbers, numeric > sparse matrices, ... > > To keep such methods all internally consistent with > range.default(), I could envision something like this > > > .rangeNum <- function(..., na.rm = FALSE, finite = FALSE, isNumeric) > { > x <- c(..., recursive = TRUE) > if(isNumeric(x)) { > if(finite) x <- x[is.finite(x)] > else if(na.rm) x <- x[!is.na(x)] > c(min(x), max(x)) > } else { > if(finite) na.rm <- TRUE > c(min(x, na.rm=na.rm), max(x, na.rm=na.rm)) > } > } > > range.default <- function(..., na.rm = FALSE, finite = FALSE) > .rangeNum(..., na.rm=na.rm, finite=finite, isNumeric = is.numeric) > > range.POSIXct <- range.Date <- function(..., na.rm = FALSE, finite = FALSE) > .rangeNum(..., na.rm=na.rm, finite=finite, isNumeric = function(.)TRUE) > > > > which would also provide .rangeNum() to be used by implementors > of other numeric-like classes to provide their own range() > method as a 1-liner *and* be future-consistent with the default method.. > > > > > > On Sat, Apr 29, 2023 at 4:47 PM Martin Maechler > > wrote: > >> > >> >>>>> Davis Vaughan via R-devel > >> >>>>> on Fri, 28 Apr 2023 11:12:27 -0400 writes: &
[Rd] Should `expand.grid()` consistently drop `NULL` inputs?
Hi all, I noticed that `expand.grid()` has somewhat inconsistent behavior with dropping `NULL` inputs. In particular, if there is a leading `NULL`, then it ends up as a column in the resulting data frame, which seems pretty undesirable. Also, notice in the last example that `Var3` is used as the column name on the `NULL`, which is wrong. I think the most consistent behavior would be to unconditionally drop `NULL`s anywhere they appear (i.e. treat an `expand.grid()` call with `NULL` inputs as semantically equivalent to the same call without `NULL`s). ``` dropattrs <- function(x) { attributes(x) <- list(names = names(x)) x } # `NULL` dropped dropattrs(expand.grid(NULL)) #> named list() # `NULL` dropped dropattrs(expand.grid(1, NULL)) #> $Var1 #> numeric(0) # Oh no! Leading `NULL` ends up in the data frame! dropattrs(expand.grid(NULL, 1)) #> $Var2 #> NULL #> #> [[2]] #> numeric(0) # Oh no! This one does too! dropattrs(expand.grid(1, NULL, 2)) #> $Var1 #> numeric(0) #> #> $Var3 #> NULL #> #> [[3]] #> numeric(0) ``` Thanks, Davis __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] `as.data.frame.matrix()` can produce a data frame without a `names` attribute
Hi all, I recently learned that it is possible for `as.data.frame.matrix()` to produce a data frame with 0 columns that is also entirely missing a `names` attribute, and I think this is a bug: ``` # No `names`, weird! attributes(as.data.frame(matrix(nrow = 0, ncol = 0))) #> $class #> [1] "data.frame" #> #> $row.names #> integer(0) # This is what I expected attributes(data.frame()) #> $names #> character(0) #> #> $row.names #> integer(0) #> #> $class #> [1] "data.frame" ``` In my experience, 0 column data frames should probably still have a `names` attribute, and it should be set to `character()`. Some evidence to support my theory is that OOB subsetting doesn't give the intended error with this weird data frame: ``` # Good OOB error df <- data.frame() df[1] #> Error in `[.data.frame`(df, 1): undefined columns selected # This is weird! df <- as.data.frame(matrix(nrow = 0, ncol = 0)) df[1] #> NULL #> <0 rows> (or 0-length row.names) ``` The one exception to requiring a `names` attribute that I can think of is `as.data.frame(optional = TRUE)`, mostly for internal use by `data.frame()` on each of the columns, but that doesn't seem to apply here. Thanks, Davis Vaughan __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel