[Rd] Small typo in Writing R Extensions

2023-02-08 Thread Davis Vaughan via R-devel
Hi all,

Writing R Extensions describes `R_NewEnv()` as:

```
At times it may also be useful to create a new environment frame in C
code. R_NewEnv is a C version of the R function new.env:

SEXP R_NewEnv(SEXP enclos, int hash, ins size)
```

There is a typo here where `ins size` should be `int size`.

Thanks!
Davis

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] On optimizing `R_NewEnv()`

2023-02-08 Thread Davis Vaughan via R-devel
Hi all,

I really like the addition of `R_NewEnv()` back in 4.1.0
https://github.com/wch/r-source/blob/625ab8d45f86f65561e53627e1f0c220bdc5f752/src/main/envir.c#L3619-L3630

I have a use case where I'm likely to call this function a large
number of times to generate many small hashed environments, so I'd
like to optimize it as far as possible.

I noticed that it takes `int size`, converts that to a SEXP for
`R_NewHashedEnv()`, which then simply converts that back to an `int`
here:
https://github.com/wch/r-source/blob/625ab8d45f86f65561e53627e1f0c220bdc5f752/src/main/envir.c#L378

I wonder if we could cut out that intermediate SEXP (along with its
protection) by adjusting `R_NewHashedEnv()` to instead take `int
size`.

I'd be happy to do a patch if that sounds good. I'd update all uses of
`R_NewHashedEnv()` to supply `int`s instead, which actually seems like
it would make every instance of calling that function simpler:
https://github.com/search?q=repo%3Awch%2Fr-source%20R_NewHashedEnv&type=code

So hopefully a win everywhere?

Thanks,
Davis

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] An alternative algorithm for `which()`

2023-04-05 Thread Davis Vaughan via R-devel
Hi all,

I've sent in a bugzilla patch for an alternative C algorithm for
`which()` which uses less memory and is often faster in many real life
scenarios. I've documented it in full on the bugzilla page, with many
examples:
https://bugs.r-project.org/show_bug.cgi?id=18495

The short version is that the performance comes from making the loops
branchless, which seems to be particularly helpful for `which()`. With
`which(x)`, I'd argue that branches are often hard for the compiler to
predict since in most real data there is typically no indication that
if the i-th element of `x` is `TRUE`, then the i+1-th element might
also be `TRUE`.

I've received a few comments on the bugzilla page, but I'd love it if
anyone else could chime in and provide their own thoughts!

Thanks,
Davis Vaughan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Possible inconsistency between `as.complex(NA_real_)` and the docs

2023-04-14 Thread Davis Vaughan via R-devel
Hi all,

Surprisingly (at least to me), `as.complex(NA_real_)` results in
`complex(real = NA_real_, imaginary = 0)` rather than `NA_complex_`.

It seems to me that this goes against the docs of `as.complex()`,
which say this in the Details section:

"Up to R versions 3.2.x, all forms of NA and NaN were coerced to a
complex NA, i.e., the NA_complex_ constant, for which both the real
and imaginary parts are NA. Since R 3.3.0, typically only objects
which are NA in parts are coerced to complex NA, but others with NaN
parts, are not. As a consequence, complex arithmetic where only NaN's
(but no NA's) are involved typically will not give complex NA but
complex numbers with real or imaginary parts of NaN."

To me this suggests that `NA_real_`, which is "NA in parts", should
have been coerced to "complex NA".

`NA_integer_` is actually coerced to `NA_complex_`, which to me is
further evidence that `NA_real_` should have been as well.

Here is the original commit where this behavior was changed in R 3.3.0:
https://github.com/wch/r-source/commit/4a4c2052e5a541981a249d4fcf92b54ca7f0a2df

```
# This is expected, based on the docs
x <- as.complex(NaN)
Re(x)
#> [1] NaN
Im(x)
#> [1] 0

# This is not expected. The docs say:
# "Since R 3.3.0, typically only objects which are NA in parts are
coerced to complex NA"
# but that doesn't seem true for `NA_real_`.
x <- as.complex(NA_real_)
Re(x)
#> [1] NA
Im(x)
#> [1] 0

# It does seem to be the case for `NA_integer_`
x <- as.complex(NA_integer_)
Re(x)
#> [1] NA
Im(x)
#> [1] NA
```

Thanks,
Davis Vaughan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] range() for Date and POSIXct could respect `finite = TRUE`

2023-04-28 Thread Davis Vaughan via R-devel
Hi all,

I noticed that `range.default()` has a nice `finite = TRUE` argument,
but it doesn't actually apply to Date or POSIXct due to how
`is.numeric()` works.

```
x <- .Date(c(0, Inf, 1, 2, Inf))
x
#> [1] "1970-01-01" "Inf""1970-01-02" "1970-01-03" "Inf"

# Darn!
range(x, finite = TRUE)
#> [1] "1970-01-01" "Inf"

# What I want
.Date(range(unclass(x), finite = TRUE))
#> [1] "1970-01-01" "1970-01-03"
```

I think `finite = TRUE` would be pretty nice for Dates in particular.

As a motivating example, sometimes you have ranges of dates
represented by start/end pairs. It is fairly natural to represent an
event that hasn't ended yet with an infinite date. If you need to then
compute a sequence of dates spanning the full range of the start/end
pairs, it would be nice to be able to use `range(finite = TRUE)` to do
so:

```
start <- as.Date(c("2019-01-05", "2019-01-10", "2019-01-11", "2019-01-14"))
end <- as.Date(c("2019-01-07", NA, "2019-01-14", NA))
end[is.na(end)] <- Inf

# `end = Inf` means that the event hasn't "ended" yet
data.frame(start, end)
#>startend
#> 1 2019-01-05 2019-01-07
#> 2 2019-01-10Inf
#> 3 2019-01-11 2019-01-14
#> 4 2019-01-14Inf

# Create a full sequence along all days in start/end
range <- .Date(range(unclass(c(start, end)), finite = TRUE))
seq(range[1], range[2], by = 1)
#>  [1] "2019-01-05" "2019-01-06" "2019-01-07" "2019-01-08" "2019-01-09"
#>  [6] "2019-01-10" "2019-01-11" "2019-01-12" "2019-01-13" "2019-01-14"
```

It seems like one option is to create a `range.Date()` method that
unclasses, forwards the arguments on to a second call to `range()`,
and then reclasses?

```
range.Date <- function(x, ..., na.rm = FALSE, finite = FALSE) {
  .Date(range(unclass(x), na.rm = na.rm, finite = finite), oldClass(x))
}
```

This is similar to how `rep.Date()` works.

Thanks,
Davis Vaughan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] range() for Date and POSIXct could respect `finite = TRUE`

2023-05-01 Thread Davis Vaughan via R-devel
Martin,

Yes, I missed that those have `Summary.*` methods, thanks!

Tweaking those to respect `finite = TRUE` sounds great. It seems like
it might be a little tricky since the Summary methods call
`NextMethod()`, and `range.default()` uses `is.numeric()` to determine
whether or not to apply `finite`. Because `is.numeric.Date()` is
defined, that always returns `FALSE` for Dates (and POSIXt). Because
of that, it may still be easier to just write a specific
`range.Date()` method, but I'm not sure.

-Davis

On Sat, Apr 29, 2023 at 4:47 PM Martin Maechler
 wrote:
>
> >>>>> Davis Vaughan via R-devel
> >>>>> on Fri, 28 Apr 2023 11:12:27 -0400 writes:
>
> > Hi all,
>
> > I noticed that `range.default()` has a nice `finite =
> > TRUE` argument, but it doesn't actually apply to Date or
> > POSIXct due to how `is.numeric()` works.
>
> Well, I think it would / should never apply:
>
> range() belongs to the "Summary" group generics (as min, max, ...)
>
> and there  *are*  Summary.Date()  and Summary.POSIX{c,l}t() methods.
>
> Without checking further for now, I think you are indirectly
> suggesting to enhance these three Summary.*() methods so they do
> obey  'finite = TRUE' .
>
> I think I agree they should.
>
> Martin
>
> > ``` x <- .Date(c(0, Inf, 1, 2, Inf)) x #> [1] "1970-01-01"
> > "Inf" "1970-01-02" "1970-01-03" "Inf"
>
> > # Darn!  range(x, finite = TRUE) #> [1] "1970-01-01" "Inf"
>
> > # What I want .Date(range(unclass(x), finite = TRUE)) #>
> > [1] "1970-01-01" "1970-01-03" ```
>
> > I think `finite = TRUE` would be pretty nice for Dates in
> > particular.
>
> > As a motivating example, sometimes you have ranges of
> > dates represented by start/end pairs. It is fairly natural
> > to represent an event that hasn't ended yet with an
> > infinite date. If you need to then compute a sequence of
> > dates spanning the full range of the start/end pairs, it
> > would be nice to be able to use `range(finite = TRUE)` to
> > do so:
>
> > ``` start <- as.Date(c("2019-01-05", "2019-01-10",
> > "2019-01-11", "2019-01-14")) end <-
> > as.Date(c("2019-01-07", NA, "2019-01-14", NA))
> > end[is.na(end)] <- Inf
>
> > # `end = Inf` means that the event hasn't "ended" yet
> > data.frame(start, end) #> start end #> 1 2019-01-05
> > 2019-01-07 #> 2 2019-01-10 Inf #> 3 2019-01-11 2019-01-14
> > #> 4 2019-01-14 Inf
>
> > # Create a full sequence along all days in start/end range
> > <- .Date(range(unclass(c(start, end)), finite = TRUE))
> > seq(range[1], range[2], by = 1) #> [1] "2019-01-05"
> > "2019-01-06" "2019-01-07" "2019-01-08" "2019-01-09" #> [6]
> > "2019-01-10" "2019-01-11" "2019-01-12" "2019-01-13"
> > "2019-01-14" ```
>
> > It seems like one option is to create a `range.Date()`
> > method that unclasses, forwards the arguments on to a
> > second call to `range()`, and then reclasses?
>
> > ``` range.Date <- function(x, ..., na.rm = FALSE, finite =
> > FALSE) { .Date(range(unclass(x), na.rm = na.rm, finite =
> > finite), oldClass(x)) } ```
>
> > This is similar to how `rep.Date()` works.
>
> > Thanks, Davis Vaughan
>
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] range() for Date and POSIXct could respect `finite = TRUE`

2023-05-09 Thread Davis Vaughan via R-devel
It seems like the main problem is that `is.numeric(x)` isn't fully
indicative of whether or not `is.finite(x)` makes sense for `x` (i.e.
Date isn't numeric but does allow infinite dates).

So I could also imagine a new `allows.infinite()` S3 generic that
would return a single TRUE/FALSE for whether or not the type allows
infinite values, this would also be indicative of whether or not
`is.finite()` and `is.infinite()` make sense on that type. I imagine
it being used like:

```
allows.infinite <- function(x) {
  UseMethod("allows.infinite")
}
allows.infinite.default <- function(x) {
  is.numeric(x) # For backwards compatibility, maybe? Not sure.
}
allows.infinite.Date <- function(x) {
  TRUE
}
allows.infinite.POSIXct <- function(x) {
  TRUE
}

range.default <- function (..., na.rm = FALSE, finite = FALSE) {
  x <- c(..., recursive = TRUE)
  if (allows.infinite(x)) { # changed from `is.numeric()`
if (finite)
  x <- x[is.finite(x)]
else if (na.rm)
  x <- x[!is.na(x)]
c(min(x), max(x))
  }
  else {
if (finite)
  na.rm <- TRUE
c(min(x, na.rm = na.rm), max(x, na.rm = na.rm))
  }
}
```

It could allow other R developers to also use the pattern of:

```
if (allows.infinite(x)) {
  # conditionally do stuff with is.infinite(x)
}
```

and that seems like it could be rather nice.

It would avoid the need for `range.Date()` and `range.POSIXct()` methods too.

-Davis

On Thu, May 4, 2023 at 5:29 AM Martin Maechler
 wrote:
>
> >>>>> Davis Vaughan
> >>>>> on Mon, 1 May 2023 08:46:33 -0400 writes:
>
> > Martin,
> > Yes, I missed that those have `Summary.*` methods, thanks!
>
> > Tweaking those to respect `finite = TRUE` sounds great. It seems like
> > it might be a little tricky since the Summary methods call
> > `NextMethod()`, and `range.default()` uses `is.numeric()` to determine
> > whether or not to apply `finite`. Because `is.numeric.Date()` is
> > defined, that always returns `FALSE` for Dates (and POSIXt). Because
> > of that, it may still be easier to just write a specific
> > `range.Date()` method, but I'm not sure.
>
> > -Davis
>
> I've looked more closely now, and indeed,
> range() is the only function in the  Summary  group
> where (only) the default method has a 'finite' argument.
> which strikes me as somewhat asymmetric / inconsequential, as
> after all,  range(.) := c(min(.), max(.)) ,
> but  min() and max() do not obey an finite=TRUE setting, note
>
> > min(c(-Inf,3:5), finite=TRUE)
> Error: attempt to use zero-length variable name
>
> where the error message also is not particularly friendly
> and of course has nothing to with 'finite' :
>
> > max(1:4, foo="bar")
> Error: attempt to use zero-length variable name
> >
>
> ... but that is diverting;  coming back to the topic:  Given
> that 'finite' only applies to range() {and there is just a convenience},
> I do agree that from my own work & support to make `Date` and
> `POSIX(c)t` behave more number-like, it would be "nice" to have
> range() obey a `finite=TRUE` also for these.
>
> OTOH, there are quite a few other 'number-like' thingies for
> which I would then like to have  range(*, finite=TRUE) work,
> e.g.,  "mpfr" (package {Rmpfr}) or "bigz" {gmp} numbers, numeric
> sparse matrices, ...
>
> To keep such methods all internally consistent with
> range.default(), I could envision something like this
>
>
> .rangeNum <- function(..., na.rm = FALSE, finite = FALSE, isNumeric)
> {
> x <- c(..., recursive = TRUE)
> if(isNumeric(x)) {
> if(finite) x <- x[is.finite(x)]
> else if(na.rm) x <- x[!is.na(x)]
> c(min(x), max(x))
> } else {
> if(finite) na.rm <- TRUE
> c(min(x, na.rm=na.rm), max(x, na.rm=na.rm))
> }
> }
>
> range.default <- function(..., na.rm = FALSE, finite = FALSE)
> .rangeNum(..., na.rm=na.rm, finite=finite, isNumeric = is.numeric)
>
> range.POSIXct <- range.Date <- function(..., na.rm = FALSE, finite = FALSE)
> .rangeNum(..., na.rm=na.rm, finite=finite, isNumeric = function(.)TRUE)
>
>
>
> which would also provide .rangeNum() to be used by implementors
> of other numeric-like classes to provide their own range()
> method as a 1-liner *and* be future-consistent with the default method..
>
>
>
>
> > On Sat, Apr 29, 2023 at 4:47 PM Martin Maechler
> >  wrote:
> >>
> >> >>>>> Davis Vaughan via R-devel
> >> >>>>> on Fri, 28 Apr 2023 11:12:27 -0400 writes:
&

[Rd] Should `expand.grid()` consistently drop `NULL` inputs?

2023-10-02 Thread Davis Vaughan via R-devel
Hi all,

I noticed that `expand.grid()` has somewhat inconsistent behavior with
dropping `NULL` inputs. In particular, if there is a leading `NULL`,
then it ends up as a column in the resulting data frame, which seems
pretty undesirable. Also, notice in the last example that `Var3` is
used as the column name on the `NULL`, which is wrong.

I think the most consistent behavior would be to unconditionally drop
`NULL`s anywhere they appear (i.e. treat an `expand.grid()` call with
`NULL` inputs as semantically equivalent to the same call without
`NULL`s).

```
dropattrs <- function(x) {
  attributes(x) <- list(names = names(x))
  x
}

# `NULL` dropped
dropattrs(expand.grid(NULL))
#> named list()

# `NULL` dropped
dropattrs(expand.grid(1, NULL))
#> $Var1
#> numeric(0)

# Oh no! Leading `NULL` ends up in the data frame!
dropattrs(expand.grid(NULL, 1))
#> $Var2
#> NULL
#>
#> [[2]]
#> numeric(0)

# Oh no! This one does too!
dropattrs(expand.grid(1, NULL, 2))
#> $Var1
#> numeric(0)
#>
#> $Var3
#> NULL
#>
#> [[3]]
#> numeric(0)
```

Thanks,
Davis

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] `as.data.frame.matrix()` can produce a data frame without a `names` attribute

2024-03-21 Thread Davis Vaughan via R-devel
Hi all,

I recently learned that it is possible for `as.data.frame.matrix()` to
produce a data frame with 0 columns that is also entirely missing a
`names` attribute, and I think this is a bug:

```
# No `names`, weird!
attributes(as.data.frame(matrix(nrow = 0, ncol = 0)))
#> $class
#> [1] "data.frame"
#>
#> $row.names
#> integer(0)

# This is what I expected
attributes(data.frame())
#> $names
#> character(0)
#>
#> $row.names
#> integer(0)
#>
#> $class
#> [1] "data.frame"
```

In my experience, 0 column data frames should probably still have a
`names` attribute, and it should be set to `character()`. Some
evidence to support my theory is that OOB subsetting doesn't give the
intended error with this weird data frame:

```
# Good OOB error
df <- data.frame()
df[1]
#> Error in `[.data.frame`(df, 1): undefined columns selected

# This is weird!
df <- as.data.frame(matrix(nrow = 0, ncol = 0))
df[1]
#> NULL
#> <0 rows> (or 0-length row.names)
```

The one exception to requiring a `names` attribute that I can think of
is `as.data.frame(optional = TRUE)`, mostly for internal use by
`data.frame()` on each of the columns, but that doesn't seem to apply
here.

Thanks,
Davis Vaughan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel