from:"hadley wickham"

[Rd] R mailing list archive: alternative interface

2005-06-26 Thread hadley wickham

For a programming competition (http://railsday.com) I recently
entered, I created a web-application to nicely display mail archives. 
I've loaded up a couple of months worth of r-devel mail and made it
available here: http://listomagic.had.co.nz/

The big advantages over the currrent mailman archive are:

 * messages are threaded, and messages in a thread are displayed
together, and some effort is made to reduce redundant quoting

 * built in search

 * rss feeds for author, thread and search

I'd very interested to know if you think this useful and if it'd be
worthwhile to keep in sync with the list.  Any comments regarding
appearance, functionality etc would also be gratefully recieved.

Thanks,

Hadley

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Package tests: best practices

2005-08-22 Thread hadley wickham

I'm writing some tests for a package and I have a few questions
regarding best practices.  I've read "tests subdirectory" paragraph in
writing R extension, but I'm left wanting more.
Firstly, can I assume that the document root will always be set to the
test directory? (that what a couple of quick tests seemed to show)

Obviously, I want to test the code in my package - how do I load it? 
I assume I can't use library(XXX) because that will load the currently
installed version - should I source in all ../R/*.r instead?

Thanks for your advice,

Hadley

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Functions with the same name: best practices

2005-08-22 Thread hadley wickham

Ok, here's another best practices question - let's say I'm writing a
package and I want to use a function name that is already claimed by a
function in the base R packages.  For the sake of argument, let's
pretend this function is for profiling the performance of a function
(like Rprof for example), and so an obvious name that comes to mind is
profile.  This, of course, clashes with the built in profile for
"investigating behavior of objective function near the solution
represented by fitted."

A little thinking and a quick survey of other packages reveal some
possible solutions:

 * capitalise the function differently (eg. Profile)
 * use a prefix/suffic (eg. Rprof)
 * use a thesaurus
 * use namespaces (and rely on others to use namespaces correctly in
their code/packages)

What would you suggest?

Thanks again,

Hadley

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Functions with the same name: best practices

2005-08-26 Thread hadley wickham

Thanks to all of you for your advice. I will read up on namespaces and
start using them to "protect" my internal function from name clashes
with other packages, and endeavour to my public functions unique
names.

I know other languages (eg. python) separate loading a package and
including it in the default namespace, do you think R will ever move
to such a system?

Thanks,

Hadley

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] by() processing on a dataframe

2005-09-30 Thread hadley wickham

I'm not entirely sure what you want, but maybe this does the trick?

data.frame.by <- function(data, variables, fun, ...) {
if (length(variables) == 0 ) {
df <- data.frame(results = 0)
df$results <- list(fun(data$value, ...))
return(df)
}

sorted <- sort.df(data, variables)[,c(variables), drop=FALSE]
duplicates <- duplicated(sorted[,variables, drop=FALSE])
index <- cumsum(!duplicates)

results <- by(data, index, fun, ...)

cols <- sorted[!duplicates,variables, drop=FALSE]
cols$results <- array(results)
cols
}


sort.df <- function(data, vars) {
data[do.call("order", data[,vars, drop=FALSE]), ,drop=FALSE]
}


dataset <- data.frame(gp1 = rep(1:2, c(4,4)), gp2 = rep(1:4,
c(2,2,2,2)), value = rnorm(8))

data.frame.by(dataset, c("gp1", "gp2"), function(data) mean(data$value))
data.frame.by(dataset, "gp1", function(data) tapply(data$value, data$gp2, mean))
data.frame.by(dataset, "gp1", function(data) lm(gp2 ~ value, data)) #
doesn't print, but everything is there ok

(note that the results column will be a list if necessary - this may
be a serious abuse of data frames, but I'm not sure and no one replied
when I queried the list)

Hadley

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Problems with example(Grid) in grid package

2005-10-21 Thread hadley wickham

I've also noticed the behaviour of grid.rect() has changed in 2.2.0. 
Before the fill defaulted to transparent, but now it defaults to
white.

Hadley

On 10/21/05, Gabor Grothendieck <[EMAIL PROTECTED]> wrote:
> The following:
>
> library(grid)
> grid.newpage()
> example(Grid)
>
> has the yaxis label partly cut off and the x axis label does not appear at 
> all.
> Also ?grid.multipanel in that example brings up documentation for
> grid-internal but this would not seem to be internal if its part of an 
> example.
>
> I am using:
>
> > R.version.string  # Windows XP
> [1] "R version 2.2.0, 2005-09-20"
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Strange behaviour of type conversion (PR#8256)

2005-10-27 Thread hadley wickham

> Where is my error??

You are assuming that numbers are represented exactly:
http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f

> 0.3/0.1 == 3
[1] FALSE

Hadley

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Brainstorm: Alpha and Beta testing of R versions

2005-11-07 Thread hadley wickham

> actually, it might not be a bad idea to require a unique
> bug-entry interface -- actually we have been thinking of moving
> to bugzilla -- if only Peter Dalgaard could find a smart enough
> person (even to be paid) who'd port all the old bug reports into the
> new format..

If you haven't already it might worthwhile looking in to fogbugz and trac.

Fogbugz (http://www.fogcreek.com/FogBugz/) is a commercial bug
tracking software package.  It is very professional and has been
designed to make tracking and submitting bugs as easy as possible
(sometimes you do get what you pay for).

Trac (http://www.edgewall.com/trac/) is more of an integrated software
development environment (open source) offering svn repository
browsing, a wiki and bug tracking.  It makes it very easy to link
between bug reports and the commits that fix them (although I know
fogbugz does this too, and I'm sure bugzilla does as well).

Hadley

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R-bugs e-mail {was ... (Debian Bug 344248): ...}

2005-12-21 Thread hadley wickham

> PLEASE, PLEASE:
> do use
> [EMAIL PROTECTED]
> and nothing else

Perhaps
http://bugs.r-project.org/cgi-bin/R
should be updated then?

Hadley

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] cairo anyone?

2005-12-23 Thread hadley wickham

> | Michael Lawrence has as part of RGtk2.
>
> Speaking of which -- I tried to find his code anywhere on the "Internets"
> following his very nice DSC presentation, but no beans.  Why is this in
> hiding?  Is it expected to surface at some point?  Any insights, Duncan?

Michael is currently working on autogenerated documentation, and I
think will be ready to release early next year.

Hadley

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Retrieving an unevaluated argument

2006-02-01 Thread hadley wickham

I'm trying to retrieve an unevalated argument (a list in particular). 
I can do this easily when I call the function directly:

a1 <- function(x) match.call()$x

> a1(list(y=x^2))
list(y = x^2)

But when the function is called by another function, it gets trickier

b <- function(x, f) f(x)

> b(list(x^2), a1)
x

The best I've been able to do is:

a2 <- function(x) parse(text=deparse(substitute(x, parent.frame([[1]]

> b(list(x^2), a2)
list(x^2)

But I'm sure there must be a better way!

Hadley

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Converting an unevaluted list to list of unevaluted elements

2006-02-01 Thread hadley wickham

Thanks to Andy Liaw, I have realised my problem isn't getting an
unevaluated argument, my problem really is converting an unevaluted
list to list of unevaluted elements.  That is, how can I go from

substitute(list(a=x, b=c))

to

list(a=substitute(x), b=substitute(c))


(I am also interested in a general means of getting the "correct"
unevaluated argument. ie, what should a be to always return list(x=1)
for these functions:
b <- function(x) a(x)
c <- function(x) b(x)
d <- function(x) c(x)

a(list(x=1))
b(list(x=1))
c(list(x=1))
d(list(x=1))
)

Thanks, as always, for your help

Hadley

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Best practices in developing package: From a single file

2018-01-30 Thread Hadley Wickham

On Tue, Jan 30, 2018 at 4:55 PM, Duncan Murdoch
 wrote:
> On 30/01/2018 4:30 PM, Kenny Bell wrote:
>>
>> In response to Duncan regarding the use of roxygen2 from the point of view
>> of a current user, I believe the issue he brings up is one of correlation
>> rather than causation.
>
>
> Could be.  However, I think editing comments in a .R file is a bit harder
> than editing text in a .Rd file, so I think the format discourages editing.
> I think it does make it easier to pass R CMD check the first time, but I
> don't think you should be satisfied with that.

One counter-point: I find it much easier to remember to update the
documentation when you update the code, if the code and the
documentation are very close together. I think mingling code and
documentation in the same file does add a subtle pressure to write
shorter docs, but I'm not entirely sure that's a bad thing - for long
form writing, vignettes are a much better solution anyway (since you
often want to mingle code and explanation).

Personally, I don't find writing in comments any harder than writing
in .Rd files, especially now that you can write in markdown and have
it automatically translated to Rd formatting commands.  And on the
negative side of Rd, I find it frustrating to have to copy and paste
the function definition to the usage section every time I modify an
argument. It just feels like unnecessary busywork that the computer
should be able to do for me (although I do understand why it is not
possible).

>> Writing my first piece of R documentation was made much easier by using
>> roxygen2, and it shallowed the learning curve substantially.
>
> I'm not completely up to date on Roxygen2 these days:  can you do some pages
> in Rd, others in Roxygen?  That's not quite as good as being able to switch
> back and forth, but it would allow you to start in Roxygen, then gradually
> move to Rd when editing there was easier.

Yes, that's possible, and to protect you in mixed environment,
roxygen2 will never overwrite a file that it did not itself create.

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Best practices in developing package: From a single file

2018-01-30 Thread Hadley Wickham

>> There is package.skeleton() in base R as you already mentioned. It drove
>> me
>> bonkers that it creates packages which then fail R CMD check, so I wrote a
>> wrapper package (pkgKitten) with another helper function (kitten()) which
>> calls the base R helper and then cleans up it---but otherwise remains
>> faithful to it.
>
>
> Failing R CMD check isn't a big deal:  you want to be reminded to edit those
> incomplete help files.  But I think I recall that you couldn't even build
> the package that package.skeleton() created, and that indeed would be
> irritating, especially if you had a lot of functions so you had a lot of
> cleanup to do.  I don't know if that's still true because I generally use
> RStudio to create the initial package structure rather than calling
> package.skeleton myself.

Personally, I think the biggest problem with package.skeleton() is
that it assumes that the source of truth is objects in an environment.
This seems the wrong way around to me.

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Best practices in developing package: From a single file

2018-02-01 Thread Hadley Wickham

On Thu, Feb 1, 2018 at 4:29 AM, Duncan Murdoch  wrote:
> On 31/01/2018 6:59 AM, Duncan Murdoch wrote:
>>
>> On 30/01/2018 11:39 PM, Hadley Wickham wrote:
>
>  [ lots deleted ]
>>>
>>> Personally, I don't find writing in comments any harder than writing
>>> in .Rd files, especially now that you can write in markdown and have
>>> it automatically translated to Rd formatting commands.
>>
>>
>> I didn't know about the possibility of Markdown.  That's a good thing.
>> You didn't say what editor you use, but RStudio is a good guess, and it
>> also makes it easier to write in comments.
>
>
> I've taken a look at the Markdown support, and I think that is fantastic.
> I'd rather it wasn't inline in the .R file (does it have to be?), but I'd
> say it tips the balance, and I'll certainly experiment with using that for
> new projects.

Please do let me know how it goes - often a fresh set of eyes reveals
problems that an experienced user is blind to.

> The only negative I see besides forcing inline docs is pretty minor:  I can
> see that supporting Rd markup within the Markdown text will on rare
> occasions cause lots of confusion (because users won't know why their
> backslashes are doing funny things).  I'd suggest that (at least optionally)
> you should escape anything that looks like Rd markup, so a user can put text
> like \item into the middle of a paragraph and not have the Rd parser see it.

Yes, that would certainly be nice. It's a little challenging because
we're using the commonmark parser, but it should be possible somehow.

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Base R examples that write to current working directory

2018-03-29 Thread Hadley Wickham

Hi all,

Given the recent CRAN push to prevent examples writing to the working
directory, is there any interest in fixing base R examples that write
to the working directory? A few candidates are the graphics devices,
file.create(), writeBin(), writeChar(), write(), and saveRDS(). I'm
sure there are many more.

One way to catch these naughty examples would be to search for
unlink() in examples: e.g.,
https://github.com/wch/r-source/search?utf8=✓&q=unlink+extension%3ARd&type=.
Of course, simply cleaning up after yourself is not sufficient because
if those files existed before the examples were run, the examples will
destroy them.

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] predict.glm returns different results for the same model

2018-04-27 Thread Hadley Wickham

Hi all,

Very surprising (to me!) and mystifying result from predict.glm(): the
predictions vary depending on whether or not I use ns() or
splines::ns(). Reprex follows:

library(splines)

set.seed(12345)
dat <- data.frame(claim = rbinom(1000, 1, 0.5))
mns <- c(3.4, 3.6)
sds <- c(0.24, 0.35)
dat$wind <- exp(rnorm(nrow(dat), mean = mns[dat$claim + 1], sd =
sds[dat$claim + 1]))
dat <- dat[order(dat$wind), ]

m1 <- glm(claim ~ ns(wind, df = 5), data = dat, family = binomial)
m2 <- glm(claim ~ splines::ns(wind, df = 5), data = dat, family = binomial)

# The model coefficients are the same
unname(coef(m1))
#> [1]  0.5194712 -0.8687737 -0.6803954  4.0838947  2.3908674  4.1564128
unname(coef(m2))
#> [1]  0.5194712 -0.8687737 -0.6803954  4.0838947  2.3908674  4.1564128

# But the predictions are not!
newdf <- data.frame(wind = seq(min(dat$wind), max(dat$wind), length = 5))
unname(predict(m1, newdata = newdf))
#> [1] 0.51947119 0.03208719 2.82548847 3.90883496 4.06743266
unname(predict(m2, newdata = newdf))
#> [1]  0.5194712 -0.5666554 -0.1731268  2.8134844  3.9295814

Is this a bug?

(Motivating issue from this ggplot2 bug report:
https://github.com/tidyverse/ggplot2/issues/2426)

Thanks!

Hadley



-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] predict.glm returns different results for the same model

2018-04-27 Thread Hadley Wickham

On Fri, Apr 27, 2018 at 7:28 AM, Duncan Murdoch
 wrote:
> On 27/04/2018 9:25 AM, Hadley Wickham wrote:
>>
>> Hi all,
>>
>> Very surprising (to me!) and mystifying result from predict.glm(): the
>> predictions vary depending on whether or not I use ns() or
>> splines::ns(). Reprex follows: >
>> library(splines)
>>
>> set.seed(12345)
>> dat <- data.frame(claim = rbinom(1000, 1, 0.5))
>> mns <- c(3.4, 3.6)
>> sds <- c(0.24, 0.35)
>> dat$wind <- exp(rnorm(nrow(dat), mean = mns[dat$claim + 1], sd =
>> sds[dat$claim + 1]))
>> dat <- dat[order(dat$wind), ]
>>
>> m1 <- glm(claim ~ ns(wind, df = 5), data = dat, family = binomial)
>> m2 <- glm(claim ~ splines::ns(wind, df = 5), data = dat, family =
>> binomial)
>>
>> # The model coefficients are the same
>> unname(coef(m1))
>> #> [1]  0.5194712 -0.8687737 -0.6803954  4.0838947  2.3908674  4.1564128
>> unname(coef(m2))
>> #> [1]  0.5194712 -0.8687737 -0.6803954  4.0838947  2.3908674  4.1564128
>>
>> # But the predictions are not!
>> newdf <- data.frame(wind = seq(min(dat$wind), max(dat$wind), length = 5))
>> unname(predict(m1, newdata = newdf))
>> #> [1] 0.51947119 0.03208719 2.82548847 3.90883496 4.06743266
>> unname(predict(m2, newdata = newdf))
>> #> [1]  0.5194712 -0.5666554 -0.1731268  2.8134844  3.9295814
>>
>> Is this a bug?
>
>
> The two objects m1 and m2 differ more than they should, so this looks like a
> problem in glm, not just in predict.glm.
>
>> attr(m1$terms, "predvars")
> list(claim, ns(wind, knots = c(25.4756277492997, 30.2270250736796,
> 35.4093171222502, 43.038645381669), Boundary.knots = c(12.9423820390783,
> 108.071583734075), intercept = FALSE))
>
>> attr(m2$terms, "predvars")
> list(claim, splines::ns(wind, df = 5))
>
> This appears to be due to a bug in the splines package.  There, the function
> splines:::makepredictcall.ns looks like this:
>
> makepredictcall.ns <- function(var, call)
> {
> if(as.character(call)[1L] != "ns") return(call)
> at <- attributes(var)[c("knots", "Boundary.knots", "intercept")]
> xxx <- call[1L:2L]
> xxx[names(at)] <- at
> xxx
> }
>
> The test fails for m2, because as.character(call)[1L] is "splines::ns"
> instead of "ns". I'll see if I can work out a better test and submit a
> patch.

Great, thanks!


-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] length of `...`

2018-05-03 Thread Hadley Wickham

On Thu, May 3, 2018 at 8:00 AM, Gabe Becker  wrote:
> As of 3.5.0 the ...length() function does exactly what you are asking for.
> Before that, I don't know of an easy way to get the length without
> evaluation via R code. There may be one I'm not thinking of though, I
> haven't needed to do this myself.

dotlength <- function(...) length(nargs())

?

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] length of `...`

2018-05-03 Thread Hadley Wickham

On Thu, May 3, 2018 at 8:18 AM, Duncan Murdoch  wrote:
> On 03/05/2018 11:01 AM, William Dunlap via R-devel wrote:
>>
>> In R-3.5.0 you can use ...length():
>>> f <- function(..., n) ...length()
>>> f(stop("one"), stop("two"), stop("three"), n=7)
>>[1] 3
>>
>> Prior to that substitute() is the way to go
>>> g <- function(..., n) length(substitute(...()))
>>> g(stop("one"), stop("two"), stop("three"), n=7)
>>[1] 3
>>
>> R-3.5.0 also has the ...elt(n) function, which returns
>> the evaluated n'th entry in ... , without evaluating the
>> other ... entries.
>>> fn <- function(..., n) ...elt(n)
>>> fn(stop("one"), 3*5, stop("three"), n=2)
>>[1] 15
>>
>> Prior to 3.5.0, eval the appropriate component of the output
>> of substitute() in the appropriate environment:
>>> gn <- function(..., n) {
>>+   nthExpr <- substitute(...())[[n]]
>>+   eval(nthExpr, envir=parent.frame())
>>+ }
>>> gn(stop("one"), environment(), stop("two"), n=2)
>>
>>
>
> Bill, the last of these doesn't quite work, because ... can be passed down
> through a string of callers.  You don't necessarily want to evaluate it in
> the parent.frame().  For example:
>
> x <- "global"
> f <- function(...) {
>   x <- "f"
>   g(...)
> }
> g <- function(...) {
>   firstExpr <- substitute(...())[[1]]
>   c(list(...)[[1]], eval(firstExpr, envir = parent.frame()))
> }
>
> Calling g(x) correctly prints "global" twice, but calling f(x) incorrectly
> prints
>
> [1] "global" "f"
>
> You can get the first element of ... without evaluating the rest using ..1,
> but I don't know a way to do this for general n in pre-3.5.0 base R.

If you don't mind using a package:

# works with R 3.1 and up
library(rlang)

x <- "global"
f <- function(...) {
  x <- "f"
  g(...)
}
g <- function(...) {
  dots <- enquos(...)
  eval_tidy(dots[[1]])
}

f(x, stop("!"))
#> [1] "global"
g(x, stop("!"))
#> [1] "global"

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] length of `...`

2018-05-03 Thread Hadley Wickham

On Thu, May 3, 2018 at 8:28 AM, Hadley Wickham  wrote:
> On Thu, May 3, 2018 at 8:00 AM, Gabe Becker  wrote:
>> As of 3.5.0 the ...length() function does exactly what you are asking for.
>> Before that, I don't know of an easy way to get the length without
>> evaluation via R code. There may be one I'm not thinking of though, I
>> haven't needed to do this myself.
>
> dotlength <- function(...) length(nargs())
>
> ?

Oops, I got a bit overzealous there: I mean

dotlength <- function(...) nargs()

(This is subtly different from calling nargs() directly as it will
only count the elements in ...)

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] length of `...`

2018-05-03 Thread Hadley Wickham

On Thu, May 3, 2018 at 9:50 AM, Duncan Murdoch  wrote:
> On 03/05/2018 11:18 AM, Duncan Murdoch wrote:
>>
>> On 03/05/2018 11:01 AM, William Dunlap via R-devel wrote:
>>>
>>> In R-3.5.0 you can use ...length():
>>> > f <- function(..., n) ...length()
>>> > f(stop("one"), stop("two"), stop("three"), n=7)
>>> [1] 3
>>>
>>> Prior to that substitute() is the way to go
>>> > g <- function(..., n) length(substitute(...()))
>>> > g(stop("one"), stop("two"), stop("three"), n=7)
>>> [1] 3
>>>
>>> R-3.5.0 also has the ...elt(n) function, which returns
>>> the evaluated n'th entry in ... , without evaluating the
>>> other ... entries.
>>> > fn <- function(..., n) ...elt(n)
>>> > fn(stop("one"), 3*5, stop("three"), n=2)
>>> [1] 15
>>>
>>> Prior to 3.5.0, eval the appropriate component of the output
>>> of substitute() in the appropriate environment:
>>> > gn <- function(..., n) {
>>> +   nthExpr <- substitute(...())[[n]]
>>> +   eval(nthExpr, envir=parent.frame())
>>> + }
>>> > gn(stop("one"), environment(), stop("two"), n=2)
>>> 
>>>
>>
>> Bill, the last of these doesn't quite work, because ... can be passed
>> down through a string of callers.  You don't necessarily want to
>> evaluate it in the parent.frame().  For example:
>>
>> x <- "global"
>> f <- function(...) {
>> x <- "f"
>> g(...)
>> }
>> g <- function(...) {
>> firstExpr <- substitute(...())[[1]]
>> c(list(...)[[1]], eval(firstExpr, envir = parent.frame()))
>> }
>>
>> Calling g(x) correctly prints "global" twice, but calling f(x)
>> incorrectly prints
>>
>> [1] "global" "f"
>>
>> You can get the first element of ... without evaluating the rest using
>> ..1, but I don't know a way to do this for general n in pre-3.5.0 base R.
>
>
> Here's a way to do that:
>
> eval(as.name(paste0("..", n)))
>
> I was surprised this worked for n > 9, but it does.  Looking at the source,
> I think the largest legal value for n is huge; you'd hit other limits long
> before n was too big.

Maybe just get(paste0("..", n)) ?

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] download.file does not process gz files correctly (truncates them?)

2018-05-08 Thread Hadley Wickham

On Thu, May 3, 2018 at 11:34 PM, Tomas Kalibera
 wrote:
> On 05/03/2018 11:14 PM, Henrik Bengtsson wrote:
>>
>> Also, as mentioned in my
>> https://stat.ethz.ch/pipermail/r-devel/2012-August/064739.html, when
>> not specifying the mode argument, the default on Windows is mode = "w"
>> *except* for certain, case-sensitive, filename extensions:
>>
>>  if(missing(mode) && length(grep("\\.(gz|bz2|xz|tgz|zip|rda|RData)$",
>> url)))
>>  mode <- "wb"
>>
>> Just like the need for mode = "wb" on Windows, the above
>> special-file-extension-hack is only happening on Windows, and is only
>> documented in ?download.file if you're on Windows; so someone who's on
>> Linux/macOS trying to help someone on Windows may not be aware of
>> this. This adds to even more confusions, e.g. "works for me".
>
> If we were designing the API today, it would probably make more sense not to
> convert any line endings by default. Today's editors _usually_ can cope with
> different line endings and it is probably easier to detect that a text file
> has incorrect line endings rather than detecting that a binary file has been
> corrupted by an attempt to convert line endings. But whether to change
> existing, documented behavior is a different question. In order to help
> users and programmers who do not read the documentation carefully we would
> create problems for users and programmers who do. The current heuristic/hack
> is in line with the compatibility approach: it detects files that are
> obviously binary, so it changes the default behavior only for cases when it
> would obviously cause damage.

>From a purely utilitarian standpoint, there are far more users who do
not carefully read the documentation than users who do ;)

(I'd also argue that basing the decision on the file extension is
suboptimal, and it would be better to use the mime type if provided by
the server)

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] download.file does not process gz files correctly (truncates them?)

2018-05-08 Thread Hadley Wickham

On Tue, May 8, 2018 at 8:15 AM, Hadley Wickham  wrote:
> On Thu, May 3, 2018 at 11:34 PM, Tomas Kalibera
>  wrote:
>> On 05/03/2018 11:14 PM, Henrik Bengtsson wrote:
>>>
>>> Also, as mentioned in my
>>> https://stat.ethz.ch/pipermail/r-devel/2012-August/064739.html, when
>>> not specifying the mode argument, the default on Windows is mode = "w"
>>> *except* for certain, case-sensitive, filename extensions:
>>>
>>>  if(missing(mode) && length(grep("\\.(gz|bz2|xz|tgz|zip|rda|RData)$",
>>> url)))
>>>  mode <- "wb"
>>>
>>> Just like the need for mode = "wb" on Windows, the above
>>> special-file-extension-hack is only happening on Windows, and is only
>>> documented in ?download.file if you're on Windows; so someone who's on
>>> Linux/macOS trying to help someone on Windows may not be aware of
>>> this. This adds to even more confusions, e.g. "works for me".
>>
>> If we were designing the API today, it would probably make more sense not to
>> convert any line endings by default. Today's editors _usually_ can cope with
>> different line endings and it is probably easier to detect that a text file
>> has incorrect line endings rather than detecting that a binary file has been
>> corrupted by an attempt to convert line endings. But whether to change
>> existing, documented behavior is a different question. In order to help
>> users and programmers who do not read the documentation carefully we would
>> create problems for users and programmers who do. The current heuristic/hack
>> is in line with the compatibility approach: it detects files that are
>> obviously binary, so it changes the default behavior only for cases when it
>> would obviously cause damage.
>
> From a purely utilitarian standpoint, there are far more users who do
> not carefully read the documentation than users who do ;)
>
> (I'd also argue that basing the decision on the file extension is
> suboptimal, and it would be better to use the mime type if provided by
> the server)

Also note that MS just announced support for unix line endings in notepad

https://blogs.msdn.microsoft.com/commandline/2018/05/08/extended-eol-in-notepad/

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Subsetting the "ROW"s of an object

2018-06-08 Thread Hadley Wickham

Hi all,

Is there a better to way to subset the ROWs (in the sense of NROW) of
an vector, matrix, data frame or array than this?

subset_ROW <- function(x, i) {
  nd <- length(dim(x))
  if (nd <= 1L) {
x[i]
  } else {
dims <- rep(list(quote(expr = )), nd - 1L)
do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE)))
  }
}

subset_ROW(1:10, 4:6)
#> [1] 4 5 6

str(subset_ROW(array(1:10, c(10)), 2:4))
#>  int [1:3(1d)] 2 3 4
str(subset_ROW(array(1:10, c(10, 1)), 2:4))
#>  int [1:3, 1] 2 3 4
str(subset_ROW(array(1:10, c(5, 2)), 2:4))
#>  int [1:3, 1:2] 2 3 4 7 8 9
str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4))
#>  int [1:3, 1, 1] 2 3 4

subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4)
#>   x y
#> 2 2 9
#> 3 3 8
#> 4 4 7

It seems like there should be a way to do this that doesn't require
generating a call with missing arguments, but I can't think of it.

Thanks!

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Subsetting the "ROW"s of an object

2018-06-08 Thread Hadley Wickham

I suspect this will have suboptimal performance since the TRUEs will
get recycled. (Maybe there is, or could be, ALTREP, support for
recycling)
Hadley

On Fri, Jun 8, 2018 at 10:16 AM, Berry, Charles  wrote:
>
>
>> On Jun 8, 2018, at 8:45 AM, Hadley Wickham  wrote:
>>
>> Hi all,
>>
>> Is there a better to way to subset the ROWs (in the sense of NROW) of
>> an vector, matrix, data frame or array than this?
>
>
> You can use TRUE to fill the subscripts for dimensions 2:nd
>
>>
>> subset_ROW <- function(x, i) {
>>  nd <- length(dim(x))
>>  if (nd <= 1L) {
>>x[i]
>>  } else {
>>dims <- rep(list(quote(expr = )), nd - 1L)
>>do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE)))
>>  }
>> }
>
>
> subset_ROW <-
> function(x,i)
> {
> mc <- quote(x[i])
> nd <- max(1L, length(dim(x)))
> mc[seq(4, length=nd-1L)] <- rep(list(TRUE), nd - 1L)
> mc[["drop"]] <- FALSE
> eval(mc)
>
> }
>
>>
>> subset_ROW(1:10, 4:6)
>> #> [1] 4 5 6
>>
>> str(subset_ROW(array(1:10, c(10)), 2:4))
>> #>  int [1:3(1d)] 2 3 4
>> str(subset_ROW(array(1:10, c(10, 1)), 2:4))
>> #>  int [1:3, 1] 2 3 4
>> str(subset_ROW(array(1:10, c(5, 2)), 2:4))
>> #>  int [1:3, 1:2] 2 3 4 7 8 9
>> str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4))
>> #>  int [1:3, 1, 1] 2 3 4
>>
>> subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4)
>> #>   x y
>> #> 2 2 9
>> #> 3 3 8
>> #> 4 4 7
>>
>
> HTH,
>
> Chuck
>



-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Subsetting the "ROW"s of an object

2018-06-08 Thread Hadley Wickham

On Fri, Jun 8, 2018 at 11:38 AM, Berry, Charles  wrote:
>
>
>> On Jun 8, 2018, at 10:37 AM, Hervé Pagès  wrote:
>>
>> Also the TRUEs cause problems if some dimensions are 0:
>>
>>  > matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE]
>>  Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] :
>>(subscript) logical subscript too long
>
> OK. But this is easy enough to handle.
>
>>
>> H.
>>
>> On 06/08/2018 10:29 AM, Hadley Wickham wrote:
>>> I suspect this will have suboptimal performance since the TRUEs will
>>> get recycled. (Maybe there is, or could be, ALTREP, support for
>>> recycling)
>>> Hadley
>
>
> AFAICS, it is not an issue. Taking
>
> arr <- array(rnorm(2^22),c(2^10,4,4,4))
>
> as a test case
>
> and using a function that will either use the literal code 
> `x[idrop=FALSE]' or `eval(mc)':
>
> subset_ROW4 <-
>  function(x, i, useLiteral=FALSE)
> {
> literal <- quote(x[idrop=FALSE])
> mc <- quote(x[i])
> nd <- max(1L, length(dim(x)))
> mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L)
> mc[["drop"]] <- FALSE
> if (useLiteral)
> eval(literal)
> else
> eval(mc)
>  }
>
> I get identical times with
>
> system.time(for (i in 1:1) subset_ROW4(arr,seq(1,length=10,by=100),TRUE))
>
> and with
>
> system.time(for (i in 1:1) subset_ROW4(arr,seq(1,length=10,by=100),FALSE))

I think that's because you used a relatively low precision timing
mechnaism, and included the index generation in the timing. I see:

arr <- array(rnorm(2^22),c(2^10,4,4,4))
i <- seq(1,length = 10, by = 100)

bench::mark(
  arr[i, TRUE, TRUE, TRUE],
  arr[i, , , ]
)
#> # A tibble: 2 x 1
#>   expressionminmean   median  max  n_gc
#>
#> 1 arr[i, TRUE,…   7.4µs  10.9µs  10.66µs   1.22ms 2
#> 2 arr[i, , , ]   7.06µs   8.8µs   7.85µs 538.09µs 2

So not a huge difference, but it's there.

Hadley


-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Subsetting the "ROW"s of an object

2018-06-08 Thread Hadley Wickham

Hmmm, yes, there must be some special case in the C code to avoid
recycling a length-1 logical vector:

dims <- c(4, 4, 4, 1e5)

arr <- array(rnorm(prod(dims)), dims)
dim(arr)
#> [1]  4  4  4 10
i <- c(1, 3)

bench::mark(
  arr[i, TRUE, TRUE, TRUE],
  arr[i, , , ]
)[c("expression", "min", "mean", "max")]
#> # A tibble: 2 x 4
#>   expressionmin mean  max
#> 
#> 1 arr[i, TRUE, TRUE, TRUE]   41.8ms   43.6ms   46.5ms
#> 2 arr[i, , , ]   41.7ms   43.1ms   46.3ms


On Fri, Jun 8, 2018 at 12:31 PM, Berry, Charles  wrote:
>
>
>> On Jun 8, 2018, at 11:52 AM, Hadley Wickham  wrote:
>>
>> On Fri, Jun 8, 2018 at 11:38 AM, Berry, Charles  wrote:
>>>
>>>
>>>> On Jun 8, 2018, at 10:37 AM, Hervé Pagès  wrote:
>>>>
>>>> Also the TRUEs cause problems if some dimensions are 0:
>>>>
>>>>> matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE]
>>>> Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] :
>>>>   (subscript) logical subscript too long
>>>
>>> OK. But this is easy enough to handle.
>>>
>>>>
>>>> H.
>>>>
>>>> On 06/08/2018 10:29 AM, Hadley Wickham wrote:
>>>>> I suspect this will have suboptimal performance since the TRUEs will
>>>>> get recycled. (Maybe there is, or could be, ALTREP, support for
>>>>> recycling)
>>>>> Hadley
>>>
>>>
>>> AFAICS, it is not an issue. Taking
>>>
>>> arr <- array(rnorm(2^22),c(2^10,4,4,4))
>>>
>>> as a test case
>>>
>>> and using a function that will either use the literal code 
>>> `x[idrop=FALSE]' or `eval(mc)':
>>>
>>> subset_ROW4 <-
>>> function(x, i, useLiteral=FALSE)
>>> {
>>>literal <- quote(x[idrop=FALSE])
>>>mc <- quote(x[i])
>>>nd <- max(1L, length(dim(x)))
>>>mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L)
>>>mc[["drop"]] <- FALSE
>>>if (useLiteral)
>>>eval(literal)
>>>else
>>>eval(mc)
>>> }
>>>
>>> I get identical times with
>>>
>>> system.time(for (i in 1:1) 
>>> subset_ROW4(arr,seq(1,length=10,by=100),TRUE))
>>>
>>> and with
>>>
>>> system.time(for (i in 1:1) 
>>> subset_ROW4(arr,seq(1,length=10,by=100),FALSE))
>>
>> I think that's because you used a relatively low precision timing
>> mechnaism, and included the index generation in the timing. I see:
>>
>> arr <- array(rnorm(2^22),c(2^10,4,4,4))
>> i <- seq(1,length = 10, by = 100)
>>
>> bench::mark(
>>  arr[i, TRUE, TRUE, TRUE],
>>  arr[i, , , ]
>> )
>> #> # A tibble: 2 x 1
>> #>   expressionminmean   median  max  n_gc
>> #>
>> #> 1 arr[i, TRUE,…   7.4µs  10.9µs  10.66µs   1.22ms 2
>> #> 2 arr[i, , , ]   7.06µs   8.8µs   7.85µs 538.09µs 2
>>
>> So not a huge difference, but it's there.
>
>
> Funny. I get similar results to yours above albeit with smaller differences. 
> Usually < 5 percent.
>
> But with subset_ROW4 I see no consistent difference.
>
> In this example, it runs faster on average using `eval(mc)' to return the 
> result:
>
>> arr <- array(rnorm(2^22),c(2^10,4,4,4))
>> i <- seq(1,length=10,by=100)
>> bench::mark(subset_ROW4(arr,i,FALSE), subset_ROW4(arr,i,TRUE))[,1:8]
> # A tibble: 2 x 8
>   expression  min mean   median  max `itr/sec` 
> mem_alloc  n_gc
>  
>  
> 1 subset_ROW4(arr, i, FALSE)   28.9µs   34.9µs   32.1µs   1.36ms28686.
> 5.05KB 5
> 2 subset_ROW4(arr, i, TRUE)28.9µs 35µs   32.4µs 875.11µs28572.
> 5.05KB 5
>>
>
> And on subsequent reps the lead switches back and forth.
>
>
> Chuck
>



-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Subsetting the "ROW"s of an object

2018-06-08 Thread Hadley Wickham

On Fri, Jun 8, 2018 at 2:09 PM, Berry, Charles  wrote:
>
>
>> On Jun 8, 2018, at 1:49 PM, Hadley Wickham  wrote:
>>
>> Hmmm, yes, there must be some special case in the C code to avoid
>> recycling a length-1 logical vector:
>
>
> Here is a version that (I think) handles Herve's issue of arrays having one 
> or more 0 dimensions.
>
> subset_ROW <-
> function(x,i)
> {
> dims <- dim(x)
> index_list <- which(dims[-1] != 0L) + 3
> mc <- quote(x[i])
> nd <- max(1L, length(dims))
> mc[ index_list ] <- list(TRUE)
> mc[[ nd + 3L ]] <- FALSE
> names( mc )[ nd+3L ] <- "drop"
> eval(mc)
> }
>
> Curiously enough the timing is *much* better for this implementation than for 
> the first version I sent.
>
> Constructing a version of `mc' that looks like `x[idrop=FALSE]' can be 
> done with `alist(a=)' in place of `list(TRUE)' in the earlier version but 
> seems to slow things down noticeably. It requires almost twice (!!) as much 
> time as the version above.

I think that's probably because alist() is a slow way to generate a
missing symbol:

bench::mark(
  alist(x = ),
  list(x = quote(expr = )),
  check = FALSE
)[1:5]
#> # A tibble: 2 x 5
#>   expressionmin mean   median  max
#>  
#> 1 alist(x = ) 2.8µs   3.54µs   3.29µs   34.9µs
#> 2 list(x = quote(expr = ))169ns 219.38ns181ns   24.2µs

(note the units)

Hadley


-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Testing for vectors

2018-07-07 Thread Hadley Wickham

Hi all,

Is there are base function that I've missed that tests if an object is
a vector in the dimensionality sense, rather than the data structure
sense? i.e. something that checks is.null(dim(x)) ?

is.vector() is trivially disqualified since it also checks for the
presence of non-names attributes:

x <- factor(c("a", "a", "b"))
is.vector(x)
#> [1] FALSE

is.null(dim(x))
#> [1] TRUE

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Testing for vectors

2018-07-07 Thread Hadley Wickham

On Sat, Jul 7, 2018 at 12:54 PM, Duncan Murdoch
 wrote:
> On 07/07/2018 1:20 PM, Hadley Wickham wrote:
>>
>> Hi all,
>>
>> Is there are base function that I've missed that tests if an object is
>> a vector in the dimensionality sense, rather than the data structure
>> sense? i.e. something that checks is.null(dim(x)) ?
>>
>> is.vector() is trivially disqualified since it also checks for the
>> presence of non-names attributes:
>>
>> x <- factor(c("a", "a", "b"))
>> is.vector(x)
>> #> [1] FALSE
>>
>> is.null(dim(x))
>> #> [1] TRUE
>>
>
> I don't know of one.  I can't think of nontrivial cases where that
> distinction matters; do you know of any where base functions act differently
> on vectors and 1D arrays?  (A trivial example is that dimnames(x) gives
> different results for a named vector and an array with dimnames.)

I was thinking primarily of completing the set of is.matrix() and
is.array(), or generally, how do you say: is `x` a 1d dimensional
thing?

(I don't have any feel for whether the check should be is.null(dim(x))
vs. length(dim(x)) <= 1)

Hadley
-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Testing for vectors

2018-07-07 Thread Hadley Wickham

On Sat, Jul 7, 2018 at 1:50 PM, Gabe Becker  wrote:
> Hadley,
>
>>
>> I was thinking primarily of completing the set of is.matrix() and
>> is.array(), or generally, how do you say: is `x` a 1d dimensional
>> thing?
>
>
> Can you clarify what you mean by dimensionality sense and specifically 1d
> here?

What do we call a vector that is not an array? (or matrix)

What do we call an object that acts 1-dimensional? (i.e. has
length(dim()) %in% c(0, 1)) ?

> You can also have an n x 1 matrix, which technically has 2 dimensions but
> conceptually is equivalent to a 1d array and/or a vector.

Yes. You can also have array that's n x 1 x 1.

> Also, are you including lists in your conceptions of 1d vector here? I'm
> with Duncan here, in that i'm having trouble understanding exactly what you
> want to do without a bit more context.

Isn't it standard terminology that a vector is the set of atomic vectors + list?

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Testing for vectors

2018-07-08 Thread Hadley Wickham

On Sat, Jul 7, 2018 at 3:48 PM, Ott Toomet  wrote:
> Thanks, Hadley for bringing this up:-)
>
> I am teaching R and I can suggest 5 different definitions of 'vector':
>
> a) vector as a collection of homogeneous objects, indexed by [ ] (more
> precisely atomic vector).  Sometimes you hear that in R, "everything is a
> vector", but this is only true for atomic objects.
> b) vector as a collection of objects, indexed by either [ ] and [[ ]].  This
> includes atomic vectors and lists.
> c) vector versus scalar.  It pops up when teaching math and stats, and is
> somewhat confusing, in particular if my previous claim was that "R does not
> have scalars".
> d) vector versus matrix (or other arrays).  Again, it only matters when
> doing matrix operations where 'vectors', i.e. objects with NULL dimension,
> behave their own way.
> e) finally, 'is.vector' has it's own understanding what constitutes a
> vector.

Yes!

And to add to the confusion there are three meanings to numeric vector:

* As an alias for double (i.e. numeric() and as.numeric())
* To refer to integer and double types jointly (as is S3 and S4 class)
* A vector that behaves as if it is a number (e.g. is.numeric(), which
excludes factors)

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Testing for vectors

2018-07-08 Thread Hadley Wickham

On Sat, Jul 7, 2018 at 11:19 PM, Gabe Becker  wrote:
> Hadley,
>
>
> On Sat, Jul 7, 2018 at 1:32 PM, Hadley Wickham  wrote:
>>
>> On Sat, Jul 7, 2018 at 1:50 PM, Gabe Becker  wrote:
>> > Hadley,
>> >
>> >>
>> >> I was thinking primarily of completing the set of is.matrix() and
>> >> is.array(), or generally, how do you say: is `x` a 1d dimensional
>> >> thing?
>> >
>> >
>> > Can you clarify what you mean by dimensionality sense and specifically
>> > 1d
>> > here?
>>
>> What do we call a vector that is not an array? (or matrix)
>>
>> What do we call an object that acts 1-dimensional? (i.e. has
>> length(dim()) %in% c(0, 1)) ?
>
>
>
> Right, or even (length(dim()) == 0 || sum(dim() > 1) <= 1)
>
>  but that is exactly my point, those two(/three) sets of things are not the
> same. 1d arrays meet the second definition but not the first. Matrices and
> arrays that don't meet either of yours would still meet mine. Which
> definition are you proposing strictly define what a vector is?

I am not proposing any definition. I am enquiring if there is a
definition in base R. The answer appears to be now.

> Another completely unrelated way to define vector, btw, is via the vector
> interface (from what I recall this is roughly [, [[, length, and format
> methods, though I'm probably forgetting some). This is (more or less)
> equivalent to defining a vector as "a thing that can be the column of a
> data.frame and have all the base-provided machinery work".

I don't know if that definition is adequate because a call would be a
vector by that definition. I'm pretty sure a call does not make sense
as a data frame column.

Also technically data frames don't require their columns to have equal
length(), but equal NROW(). So the spirit of that definition would
imply that a matrices and arrays are also vectors, which seems like it
might be undesirable.

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Is NULL a vector?

2018-07-23 Thread Hadley Wickham

Hi all,

Would you generally consider NULL to be a vector? Base R functions are
a little inconsistent:

## In favour

``` r
identical(as.vector(NULL), NULL)
#> [1] TRUE

identical(as(NULL, "vector"), NULL)
#> [1] TRUE

# supports key vector vector generics
length(NULL)
#> [1] 0
NULL[c(3, 4, 5)]
#> NULL
NULL[[1]]
#> NULL
```

## Against

``` r
is.vector(NULL)
#> [1] FALSE

is(NULL, "vector")
#> [1] FALSE
```

## Abstentions

``` r
is.atomic(NULL)
#> [1] TRUE
# documentation states "returns NULL if x is of an atomic type (or NULL)"
# is "or" exclusive or inclusive?
```

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Is NULL a vector?

2018-07-23 Thread Hadley Wickham

On Mon, Jul 23, 2018 at 2:17 PM, Duncan Murdoch
 wrote:
> On 23/07/2018 3:03 PM, Hadley Wickham wrote:
>>
>> Hi all,
>>
>> Would you generally consider NULL to be a vector?
>
>
> According to the language definition (in the doc directory), it is not:
> "Vectors can be thought of as contiguous cells containing data. Cells are
> accessed through indexing operations such as x[5]. More details are given in
> Indexing.
>
> R has six basic (‘atomic’) vector types: logical, integer, real, complex,
> string (or character) and raw. The modes and storage modes for the different
> vector types are listed in the following table."
>
> and later
>
> "There is a special object called NULL. It is used whenever there is a need
> to indicate or specify that an object is absent. It should not be confused
> with a vector or list of zero length."

Perfect, thanks!

Also available online at
https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Vector-objects

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] vctrs: a type system for the tidyverse

2018-08-06 Thread Hadley Wickham

Hi all,

I wanted to share with you an experimental package that I’m currently
working on: vctrs, . The motivation for
vctrs is to think deeply about the output “type” of functions like
`c()`, `ifelse()`, and `rbind()`, with an eye to implementing one
strategy throughout the tidyverse (i.e. all the functions listed at
). Because this is
going to be a big change, I thought it would be very useful to get
comments from a wide audience, so I’m reaching out to R-devel to get
your thoughts.

There is quite a lot already in the readme
(), so here I’ll try to motivate
vctrs as succinctly as possible by comparing `base::c()` to its
equivalent `vctrs::vec_c()`. I think the drawbacks of `c()` are well
known, but to refresh your memory, I’ve highlighted a few at
. I think they arise
because of two main challenges: `c()` has to both combine vectors *and*
strip attributes, and it only dispatches on the first argument.

The design of vctrs is largely driven by a pair of principles:

-   The type of `vec_c(x, y)` should be the same as `vec_c(y, x)`

-   The type of `vec_c(x, vec_c(y, z))` should be the same as
`vec_c(vec_c(x, y), z)`

i.e. the type should be associative and commutative. I think these are
good principles because they makes types simpler to understand and to
implement.

Method dispatch for `vec_c()` is quite simple because associativity and
commutativity mean that we can determine the output type only by
considering a pair of inputs at a time. To this end, vctrs provides
`vec_type2()` which takes two inputs and returns their common type
(represented as zero length vector):

str(vec_type2(integer(), double()))
#>  num(0)

str(vec_type2(factor("a"), factor("b")))
#>  Factor w/ 2 levels "a","b":

# NB: not all types have a common/unifying type
str(vec_type2(Sys.Date(), factor("a")))
#> Error: No common type for date and factor

(`vec_type()` currently implements double dispatch through a combination
of S3 dispatch and if-else blocks, but this will change to a pure S3
approach in the near future.)

To find the common type of multiple vectors, we can use `Reduce()`:

vecs <- list(TRUE, 1:10, 1.5)

type <- Reduce(vec_type2, vecs)
str(type)
#>  num(0)

There’s one other piece of the puzzle: casting one vector to another
type. That’s implemented by `vec_cast()` (which also uses double
dispatch):

str(lapply(vecs, vec_cast, to = type))
#> List of 3
#>  $ : num 1
#>  $ : num [1:10] 1 2 3 4 5 6 7 8 9 10
#>  $ : num 1.5

All up, this means that we can implement the essence of `vec_c()` in
only a few lines:

vec_c2 <- function(...) {
  args <- list(...)
  type <- Reduce(vec_type, args)

  cast <- lapply(type, vec_cast, to = type)
  unlist(cast, recurse = FALSE)
}

vec_c(factor("a"), factor("b"))
#> [1] a b
#> Levels: a b

vec_c(Sys.Date(), Sys.time())
#> [1] "2018-08-06 00:00:00 CDT" "2018-08-06 11:20:32 CDT"

(The real implementation is little more complex:
)

On top of this foundation, vctrs expands in a few different ways:

-   To consider the “type” of a data frame, and what the common type of
two data frames should be. This leads to a natural implementation of
`vec_rbind()` which includes all columns that appear in any input.

-   To create a new “list\_of” type, a list where every element is of
fixed type (enforced by `[<-`, `[[<-`, and `$<-`)

-   To think a little about the “shape” of a vector, and to consider
recycling as part of the type system. (This thinking is not yet
fully fleshed out)

Thanks for making it to the bottom of this long email :) I would love to
hear your thoughts on vctrs. It’s something that I’ve been having a lot
of fun exploring, and I’d like to make sure it is as robust as possible
(and the motivations are as clear as possible) before we start using it
in other packages.

Hadley


-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] vctrs: a type system for the tidyverse

2018-08-06 Thread Hadley Wickham

> First off, you are using the word "type" throughout this email; You seem to
> mean class (judging by your Date and factor examples, and the fact you
> mention S3 dispatch) as opposed to type in the sense of what is returned by
> R's  typeof() function. I think it would be clearer if you called it class
> throughout unless that isn't actually what you mean (in which case I would
> have other questions...)

I used "type" to hand wave away the precise definition - it's not S3
class or base type (i.e. typeof()) but some hybrid of the two. I do
want to emphasise that it's a type system, not a oo system, in that
coercions are not defined by superclass/subclass relationships.

> More thoughts inline.
>
> On Mon, Aug 6, 2018 at 9:21 AM, Hadley Wickham  wrote:
>>
>> Hi all,
>>
>> I wanted to share with you an experimental package that I’m currently
>> working on: vctrs, <https://github.com/r-lib/vctrs>. The motivation for
>> vctrs is to think deeply about the output “type” of functions like
>> `c()`, `ifelse()`, and `rbind()`, with an eye to implementing one
>> strategy throughout the tidyverse (i.e. all the functions listed at
>> <https://github.com/r-lib/vctrs#tidyverse-functions>). Because this is
>> going to be a big change, I thought it would be very useful to get
>> comments from a wide audience, so I’m reaching out to R-devel to get
>> your thoughts.
>>
>> There is quite a lot already in the readme
>> (<https://github.com/r-lib/vctrs#vctrs>), so here I’ll try to motivate
>> vctrs as succinctly as possible by comparing `base::c()` to its
>> equivalent `vctrs::vec_c()`. I think the drawbacks of `c()` are well
>> known, but to refresh your memory, I’ve highlighted a few at
>> <https://github.com/r-lib/vctrs#compared-to-base-r>. I think they arise
>> because of two main challenges: `c()` has to both combine vectors *and*
>> strip attributes, and it only dispatches on the first argument.
>>
>> The design of vctrs is largely driven by a pair of principles:
>>
>> -   The type of `vec_c(x, y)` should be the same as `vec_c(y, x)`
>>
>> -   The type of `vec_c(x, vec_c(y, z))` should be the same as
>> `vec_c(vec_c(x, y), z)`
>>
>> i.e. the type should be associative and commutative. I think these are
>> good principles because they makes types simpler to understand and to
>> implement.
>>
>> Method dispatch for `vec_c()` is quite simple because associativity and
>> commutativity mean that we can determine the output type only by
>> considering a pair of inputs at a time. To this end, vctrs provides
>> `vec_type2()` which takes two inputs and returns their common type
>> (represented as zero length vector):
>>
>> str(vec_type2(integer(), double()))
>> #>  num(0)
>>
>> str(vec_type2(factor("a"), factor("b")))
>> #>  Factor w/ 2 levels "a","b":
>
>
> What is the reasoning behind taking the union of the levels here? I'm not
> sure that is actually the behavior I would want if I have a vector of
> factors and I try to append some new data to it. I might want/ expect to
> retain the existing levels and get either NAs or an error if the new data
> has (present) levels not in the first data. The behavior as above doesn't
> seem in-line with what I understand the purpose of factors to be (explicit
> restriction of possible values).

Originally (like a week ago 😀), we threw an error if the factors
didn't have the same level, and provided an optional coercion to
character. I decided that while correct (the factor levels are a
parameter of the type, and hence factors with different levels aren't
comparable), that this fights too much against how people actually use
factors in practice. It also seems like base R is moving more in this
direction, i.e. in 3.4 factor("a") == factor("b") is an error, whereas
in R 3.5 it returns FALSE.

I'm not wedded to the current approach, but it feels like the same
principle should apply in comparisons like x == y (even though == is
outside the scope of vctrs, ideally the underlying principles would be
robust enough to suggest what should happen).

> I guess what I'm saying is that while I agree associativity is good for most
> things, it doesn't seem like the right behavior to me in the case of
> factors.

I think associativity is such a strong and useful principle that it
may be worth making some sacrifices for factors. That said, my claim
of associativity is only on the type, not the values of the type:
vec_c(fa, fb) and vec_c(fb, fa) both return factors, but the levels
a

Re: [Rd] vctrs: a type system for the tidyverse

2018-08-08 Thread Hadley Wickham

>>> Method dispatch for `vec_c()` is quite simple because associativity and
>>> commutativity mean that we can determine the output type only by
>>> considering a pair of inputs at a time. To this end, vctrs provides
>>> `vec_type2()` which takes two inputs and returns their common type
>>> (represented as zero length vector):
>>>
>>> str(vec_type2(integer(), double()))
>>> #>  num(0)
>>>
>>> str(vec_type2(factor("a"), factor("b")))
>>> #>  Factor w/ 2 levels "a","b":
>>
>>
>> What is the reasoning behind taking the union of the levels here? I'm not
>> sure that is actually the behavior I would want if I have a vector of
>> factors and I try to append some new data to it. I might want/ expect to
>> retain the existing levels and get either NAs or an error if the new data
>> has (present) levels not in the first data. The behavior as above doesn't
>> seem in-line with what I understand the purpose of factors to be (explicit
>> restriction of possible values).
>
> Originally (like a week ago 😀), we threw an error if the factors
> didn't have the same level, and provided an optional coercion to
> character. I decided that while correct (the factor levels are a
> parameter of the type, and hence factors with different levels aren't
> comparable), that this fights too much against how people actually use
> factors in practice. It also seems like base R is moving more in this
> direction, i.e. in 3.4 factor("a") == factor("b") is an error, whereas
> in R 3.5 it returns FALSE.

I now have a better argument, I think:

If you squint your brain a little, I think you can see that each set
of automatic coercions is about increasing resolution. Integers are
low resolution versions of doubles, and dates are low resolution
versions of date-times. Logicals are low resolution version of
integers because there's a strong convention that `TRUE` and `FALSE`
can be used interchangeably with `1` and `0`.

But what is the resolution of a factor? We must take a somewhat
pragmatic approach because base R often converts character vectors to
factors, and we don't want to be burdensome to users. So we say that a
factor `x` has finer resolution than factor `y` if the levels of `y`
are contained in `x`. So to find the common type of two factors, we
take the union of the levels of each factor, given a factor that has
finer resolution than both. Finally, you can think of a character
vector as a factor with every possible level, so factors and character
vectors are coercible.

(extracted from the in-progress vignette explaining how to extend
vctrs to work with your own vctrs, now that vctrs has been rewritten
to use double dispatch)

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] vctrs: a type system for the tidyverse

2018-08-08 Thread Hadley Wickham

> > I now have a better argument, I think:
>
> > If you squint your brain a little, I think you can see
> > that each set of automatic coercions is about increasing
> > resolution. Integers are low resolution versions of
> > doubles, and dates are low resolution versions of
> > date-times. Logicals are low resolution version of
> > integers because there's a strong convention that `TRUE`
> > and `FALSE` can be used interchangeably with `1` and `0`.
>
> > But what is the resolution of a factor? We must take a
> > somewhat pragmatic approach because base R often converts
> > character vectors to factors, and we don't want to be
> > burdensome to users. So we say that a factor `x` has finer
> > resolution than factor `y` if the levels of `y` are
> > contained in `x`. So to find the common type of two
> > factors, we take the union of the levels of each factor,
> > given a factor that has finer resolution than
> > both. Finally, you can think of a character vector as a
> > factor with every possible level, so factors and character
> > vectors are coercible.
>
> > (extracted from the in-progress vignette explaining how to
> > extend vctrs to work with your own vctrs, now that vctrs
> > has been rewritten to use double dispatch)
>
> I like this argumentation, and find it very nice indeed!
> It confirms my own gut feeling which had lead me to agreeing
> with you, Hadley, that taking the union of all factor levels
> should be done here.

That's great to hear :)

> As Gabe mentioned (and you've explained about) the term "type"
> is really confusing here.  As you know, the R internals are all
> about SEXPs, TYPEOF(), etc, and that's what the R level
> typeof(.) also returns.  As you want to use something slightly
> different, it should be different naming, ideally something not
> existing yet in the R / S world, maybe 'kind' ?

Agreed - I've been using type in the sense of "type system"
(particularly as it related to algebraic data types), but that's not
obvious from the current presentation, and as you note, is confusing
with existing notions of type in R. I like your suggestion of kind,
but I think it might be possible to just talk about classes, and
instead emphasise that while the components of the system are classes
(and indeed it's implemented using S3), the coercion/casting
relationship do not strictly follow the subclass/superclass
relationships.

A good motivating example is now ordered vs factor - I don't think you
can say that ordered or factor have greater resolution than the other
so:

vec_c(factor("a"), ordered("a"))
#> Error: No common type for factor and ordered

This is not what you'd expect from an _object_ system since ordered is
a subclass of factor.

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] vctrs: a type system for the tidyverse

2018-08-08 Thread Hadley Wickham

>> So we say that a
>> factor `x` has finer resolution than factor `y` if the levels of `y`
>> are contained in `x`. So to find the common type of two factors, we
>> take the union of the levels of each factor, given a factor that has
>> finer resolution than both.
>
> I'm not so sure. I think a more useful definition of resolution may be
> that it is about increasing the precision of information. In that case,
> a factor with 4 levels each of which is present has a higher resolution
> than the same data with additional-but-absent levels on the factor object.
> Now that may be different when the the new levels are not absent, but
> my point is that its not clear to me that resolution is a useful way of
> talking about factors.

An alternative way of framing factors is that they're about tracking
possible values, particular possible values that don't exist in the
data that you have. Thinking about factors in that way, makes unioning
the levels more natural.

> If users want unrestricted character type behavior, then IMHO they should
> just be using characters, and it's quite easy for them to do so in any case
> I can easily think of where they have somehow gotten their hands on a factor.
> If, however, they want a factor, it must be - I imagine - because they 
> actually
> want the the semantics and behavior specific to factors.

I think this is true in the tidyverse, which will never give you a
factor unless you explicitly ask for one, but the default in base R
(at least as soon as a data frame is involved) is to turn character
vectors into factors.

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] vctrs: a type system for the tidyverse

2018-08-09 Thread Hadley Wickham

On Thu, Aug 9, 2018 at 3:57 AM Joris Meys  wrote:
>
>  I sent this to  Iñaki personally by mistake. Thank you for notifying me.
>
> On Wed, Aug 8, 2018 at 7:53 PM Iñaki Úcar  wrote:
>
> >
> > For what it's worth, I always thought about factors as fundamentally
> > characters, but with restrictions: a subspace of all possible strings.
> > And I'd say that a non-negligible number of R users may think about
> > them in a similar way.
> >
>
> That idea has been a common source of bugs and the most important reason
> why I always explain my students that factors are a special kind of
> numeric(integer), not character. Especially people coming from SPSS see
> immediately the link with categorical variables in that way, and understand
> that a factor is a modeling aid rather than an alternative for characters.
> It is a categorical variable and a more readable way of representing a set
> of dummy variables.
>
> I do agree that some of the factor behaviour is confusing at best, but that
> doesn't change the appropriate use and meaning of factors as categorical
> variables.
>
> Even more, I oppose the ideas that :
>
> 1) factors with different levels should be concatenated.
>
> 2) when combining factors, the union of the levels would somehow be a good
> choice.
>
> Factors with different levels are variables with different information, not
> more or less information. If one factor codes low and high and another
> codes low, mid and high, you can't say whether mid in one factor would be
> low or high in the first one. The second has a higher resolution, and
> that's exactly the reason why they should NOT be combined. Different levels
> indicate a different grouping, and hence that data should never be used as
> one set of dummy variables in any model.
>
> Even when combining factors, the union of levels only makes sense to me if
> there's no overlap between levels of both factors. In all other cases, a
> researcher will need to determine whether levels with the same label do
> mean the same thing in both factors, and that's not guaranteed. And when
> we're talking a factor with a higher resolution and a lower resolution, the
> correct thing to do modelwise is to recode one of the factors so they have
> the same resolution and every level the same definition before you merge
> that data.
>
> So imho the combination of two factors with different levels (or even
> levels in a different order) should give an error. Which R currently
> doesn't throw, so I get there's room for improvement.

I 100% agree with you, and is this the behaviour that vctrs used to
have and dplyr currently has (at least in bind_rows()). But
pragmatically, my experience with dplyr is that people find this
behaviour confusing and unhelpful. And when I played the full
expression of this behaviour in vctrs, I found that it forced me to
think about the levels of factors more than I'd otherwise like to: it
made me think like a programmer, not like a data analyst. So in an
ideal world, yes, I think factors would have stricter behaviour, but
my sense is that imposing this strictness now will be onerous to most
analysts.

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] vctrs: a type system for the tidyverse

2018-08-09 Thread Hadley Wickham

On Thu, Aug 9, 2018 at 7:54 AM Joris Meys  wrote:
>
> Hi Hadley,
>
> my point actually came from a data analyst point of view. A character 
> variable is something used for extra information, eg the "any other ideas?" 
> field of a questionnaire. A categorical variable is a variable describing 
> categories defined by the researcher. If it is made clear that a factor is 
> the object type needed for a categorical variable, there is no confusion. All 
> my students get it. But I agree that in many cases people are taught that a 
> factor is somehow related to character variables. And that does not make 
> sense from a data analyst point of view if you think about variables as 
> continuous, ordinal and nominal in a model context.
>
> So I don't think adding more confusing behaviour and pitfalls is a solution 
> to something that's essentially a misunderstanding. It's something that's 
> only solved by explaining it correctly imho.

I agree with your definition of character and factor variables. It's
an important distinction, and I agree that the blurring of factors and
characters is generally undesirable. However, the merits of respecting
R's existing behaviour, and Martin Mächler's support, means that I'm
not going to change vctr's approach at this point in time. However, I
hear from you and Gabe that this is an important issue, and I'll
definitely keep it in mind as I solicit further feedback from users.

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] vctrs: a type system for the tidyverse

2018-08-09 Thread Hadley Wickham

> > As Gabe mentioned (and you've explained about) the term "type"
> > is really confusing here.  As you know, the R internals are all
> > about SEXPs, TYPEOF(), etc, and that's what the R level
> > typeof(.) also returns.  As you want to use something slightly
> > different, it should be different naming, ideally something not
> > existing yet in the R / S world, maybe 'kind' ?
>
> Agreed - I've been using type in the sense of "type system"
> (particularly as it related to algebraic data types), but that's not
> obvious from the current presentation, and as you note, is confusing
> with existing notions of type in R. I like your suggestion of kind,
> but I think it might be possible to just talk about classes, and
> instead emphasise that while the components of the system are classes
> (and indeed it's implemented using S3), the coercion/casting
> relationship do not strictly follow the subclass/superclass
> relationships.

I've taken another pass through (the first part of) the readme
(https://github.com/r-lib/vctrs#vctrs), and I'm now confident that I
can avoid using "type" by itself, and instead always use it in a
compound phrase (like type system) to avoid confusion. That leaves the
`.type` argument to many vctrs functions. I'm considering change it to
.prototype, because what you actually give it is a zero-length vector
of the class you want, i.e. a prototype of the desired output. What do
you think of prototype as a name?

Do you have any thoughts on good names for distinction vectors without
a class (i.e. logical, integer, double, ...) from vectors with a class
(e.g. factors, dates, etc). I've been thinking bare vector and S3
vector (leaving room to later think about S4 vectors). Do those sound
reasonable to you?

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] vctrs: a type system for the tidyverse

2018-08-09 Thread Hadley Wickham

On Thu, Aug 9, 2018 at 4:26 PM jan Vitek  wrote:
>
> > I'm now confident that I
> > can avoid using "type" by itself, and instead always use it in a
> > compound phrase (like type system) to avoid confusion. That leaves the
> > `.type` argument to many vctrs functions. I'm considering change it to
> > .prototype, because what you actually give it is a zero-length vector
> > of the class you want, i.e. a prototype of the desired output. What do
> > you think of prototype as a name?
>
>
> The term “type system” in computer science is used in very different ways.
> What the note describes is not a type system, but rather a set of
> coercions used by a small number of functions in one package.
>
> Typically it refers to a set of rules (either statically enforced
> by the compiler or dynamically enforced by the runtime) that ensure
> that some particular category of errors can be caught by the
> language.
>
> There is none of that here.

I think there's a bit of that flavour here:

vec_c(factor("a"), Sys.Date())
#> Error: No common type for factor and date

This isn't a type system imposed by the language, but I don't think
that's a reason not to call it a type system.

That said, I agree that calling it a type system is currently
overselling it, and I have made your proposed change to the README
(and added a very-long term goal of making a type system that could be
applied using (e.g.) annotations).

> "The short-term goal of vctrs is to develop a type system for vectors which 
> will help reason about functions that combine different types of input (e.g. 
> c(), ifelse(), rbind()). The vctrs type system encompasses base vectors (e.g. 
> logical, numeric, character, list), S3 vectors (e.g. factor, ordered, Date, 
> POSIXct), and data frames; and can be extended to deal with S3 vectors 
> defined in other packages, as described in vignette("extending-vctrs”).”
>
> ==>
>
> The short-term goal of vctrs is to specify the behavior of functions that 
> combine different types of vectors (e.g. c(), ifelse(), rbind()). The 
> specification encompasses base vectors (e.g. logical, numeric, character, 
> list), S3 vectors (e.g. factor, ordered, Date, POSIXct), and data frames; and 
> can be extended to deal with S3 vectors defined in other packages, as 
> described in vignette("extending-vctrs”).

Thanks for the nice wording!

Hadley


-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] substitute() on arguments in ellipsis ("dot dot dot")?

2018-08-13 Thread Hadley Wickham

Since you're already using bang-bang ;)

library(rlang)

dots1 <- function(...) as.list(substitute(list(...)))[-1L]
dots2 <- function(...) as.list(substitute(...()))
dots3 <- function(...) match.call(expand.dots = FALSE)[["..."]]
dots4 <- function(...) exprs(...)

bench::mark(
  dots1(1+2, "a", rnorm(3), stop("bang!")),
  dots2(1+2, "a", rnorm(3), stop("bang!")),
  dots3(1+2, "a", rnorm(3), stop("bang!")),
  dots4(1+2, "a", rnorm(3), stop("bang!")),
  check = FALSE
)[1:4]
#> # A tibble: 4 x 4
#>   expression  min mean  median
#>   
#> 1 "dots1(1 + 2, \"a\", rnorm(3), stop(\"bang!\"…   3.23µs   4.15µs  3.81µs
#> 2 "dots2(1 + 2, \"a\", rnorm(3), stop(\"bang!\"…   2.72µs   4.48µs  3.37µs
#> 3 "dots3(1 + 2, \"a\", rnorm(3), stop(\"bang!\"…   4.06µs   4.94µs  4.69µs
#> 4 "dots4(1 + 2, \"a\", rnorm(3), stop(\"bang!\"…   3.92µs4.9µs  4.46µs


On Mon, Aug 13, 2018 at 4:19 AM Henrik Bengtsson
 wrote:
>
> Thanks all, this was very helpful.  Peter's finding - dots2() below -
> is indeed interesting - I'd be curious to learn what goes on there.
>
> The different alternatives perform approximately the same;
>
> dots1 <- function(...) as.list(substitute(list(...)))[-1L]
> dots2 <- function(...) as.list(substitute(...()))
> dots3 <- function(...) match.call(expand.dots = FALSE)[["..."]]
>
> stats <- microbenchmark::microbenchmark(
>   dots1(1+2, "a", rnorm(3), stop("bang!")),
>   dots2(1+2, "a", rnorm(3), stop("bang!")),
>   dots3(1+2, "a", rnorm(3), stop("bang!")),
>   times = 10e3
> )
> print(stats)
> # Unit: microseconds
> #expr  min   lq mean median
> uq  max neval
> #  dots1(1 + 2, "a", rnorm(3), stop("bang!")) 2.14 2.45 3.04   2.58
> 2.73 1110 1
> #  dots2(1 + 2, "a", rnorm(3), stop("bang!")) 1.81 2.10 2.47   2.21
> 2.34 1626 1
> #  dots3(1 + 2, "a", rnorm(3), stop("bang!")) 2.59 2.98 3.36   3.15
> 3.31 1037 1
>
> /Henrik
>
> On Mon, Aug 13, 2018 at 7:10 AM Peter Meilstrup
>  wrote:
> >
> > Interestingly,
> >
> >as.list(substitute(...()))
> >
> > also works.
> >
> > On Sun, Aug 12, 2018 at 1:16 PM, Duncan Murdoch
> >  wrote:
> > > On 12/08/2018 4:00 PM, Henrik Bengtsson wrote:
> > >>
> > >> Hi. For any number of *known* arguments, we can do:
> > >>
> > >> one <- function(a) list(a = substitute(a))
> > >> two <- function(a, b) list(a = substitute(a), b = substitute(b))
> > >>
> > >> and so on. But how do I achieve the same when I have:
> > >>
> > >> dots <- function(...) list(???)
> > >>
> > >> I want to implement this such that I can do:
> > >>
> > >>> exprs <- dots(1+2)
> > >>> str(exprs)
> > >>
> > >> List of 1
> > >>   $ : language 1 + 2
> > >>
> > >> as well as:
> > >>
> > >>> exprs <- dots(1+2, "a", rnorm(3))
> > >>> str(exprs)
> > >>
> > >> List of 3
> > >>   $ : language 1 + 2
> > >>   $ : chr "a"
> > >>   $ : language rnorm(3)
> > >>
> > >> Is this possible to achieve using plain R code?
> > >
> > >
> > > I think so.  substitute(list(...)) gives you a single expression 
> > > containing
> > > a call to list() with the unevaluated arguments; you can convert that to
> > > what you want using something like
> > >
> > > dots <- function (...) {
> > >   exprs <- substitute(list(...))
> > >   as.list(exprs[-1])
> > > }
> > >
> > > Duncan Murdoch
> > >
> > >
> > > __
> > > R-devel@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] conflicted: an alternative conflict resolution strategy

2018-08-23 Thread Hadley Wickham

Hi all,

I’d love to get your feedback on the conflicted package, which provides an
alternative strategy for resolving ambiugous function names (i.e. when
multiple packages provide identically named functions). conflicted 0.1.0
is already on CRAN, but I’m currently preparing a revision
(), and looking for feedback.

As you are no doubt aware, R’s default approach means that the most
recently loaded package “wins” any conflicts. You do get a message about
conflicts on load, but I see a lot newer R users experiencing problems
caused by function conflicts. I think there are three primary reasons:

-   People don’t read messages about conflicts. Even if you are
conscientious and do read the messages, it’s hard to notice a single
new conflict caused by a package upgrade.

-   The warning and the problem may be quite far apart. If you load all
your packages at the top of the script, it may potentially be 100s
of lines before you encounter a conflict.

-   The error messages caused by conflicts are cryptic because you end
up calling a function with utterly unexpected arguments.

For these reasons, conflicted takes an alternative approach, forcing the
user to explicitly disambiguate any conflicts:

library(conflicted)
library(dplyr)
library(MASS)

select
#> Error: [conflicted] `select` found in 2 packages.
#> Either pick the one you want with `::`
#> * MASS::select
#> * dplyr::select
#> Or declare a preference with `conflicted_prefer()`
#> * conflict_prefer("select", "MASS")
#> * conflict_prefer("select", "dplyr")

conflicted works by attaching a new “conflicted” environment just after
the global environment. This environment contains an active binding for
any ambiguous bindings. The conflicted environment also contains
bindings for `library()` and `require()` that rebuild the conflicted
environemnt suppress default reporting (but are otherwise thin wrapeprs
around the base equivalents).

conflicted also provides a `conflict_scout()` helper which you can use
to see what’s going on:

conflict_scout(c("dplyr", "MASS"))
#> 1 conflict:
#> * `select`: dplyr, MASS

conflicted applies a few heuristics to minimise false positives (at the
cost of introducing a few false negatives). The overarching goal is to
ensure that code behaves identically regardless of the order in which
packages are attached.

-   A number of packages provide a function that appears to conflict
with a function in a base package, but they follow the superset
principle (i.e. they only extend the API, as explained to me by
Hervè Pages).

conflicted assumes that packages adhere to the superset principle,
which appears to be true in most of the cases that I’ve seen. For
example, the lubridate package provides `as.difftime()` and `date()`
which extend the behaviour of base functions, and provides S4
generics for the set operators.

conflict_scout(c("lubridate", "base"))
#> 5 conflicts:
#> * `as.difftime`: [lubridate]
#> * `date`   : [lubridate]
#> * `intersect`  : [lubridate]
#> * `setdiff`: [lubridate]
#> * `union`  : [lubridate]

There are two popular functions that don’t adhere to this principle:
`dplyr::filter()` and `dplyr::lag()` :(. conflicted handles these
special cases so they correctly generate conflicts. (I sure wish I’d
know about the subset principle when creating dplyr!)

conflict_scout(c("dplyr", "stats"))
#> 2 conflicts:
#> * `filter`: dplyr, stats
#> * `lag`   : dplyr, stats

-   Deprecated functions should never win a conflict, so conflicted
checks for use of `.Deprecated()`. This rule is very useful when
moving functions from one package to another. For example, many
devtools functions were moved to usethis, and conflicted ensures
that you always get the non-deprecated version, regardess of package
attach order:

head(conflict_scout(c("devtools", "usethis")))
#> 26 conflicts:
#> * `use_appveyor`   : [usethis]
#> * `use_build_ignore`   : [usethis]
#> * `use_code_of_conduct`: [usethis]
#> * `use_coverage`   : [usethis]
#> * `use_cran_badge` : [usethis]
#> * `use_cran_comments`  : [usethis]
#> ...

Finally, as mentioned above, the user can declare preferences:

conflict_prefer("select", "MASS")
#> [conflicted] Will prefer MASS::select over any other package
conflict_scout(c("dplyr", "MASS"))
#> 1 conflict:
#> * `select`: [MASS]

I’d love to hear what people think about the general idea, and if there
are any obviously missing pieces.

Thanks!

Hadley


-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] conflicted: an alternative conflict resolution strategy

2018-08-24 Thread Hadley Wickham

On Thu, Aug 23, 2018 at 3:46 PM Duncan Murdoch  wrote:
>
> First, some general comments:
>
> This sounds like a useful package.
>
> I would guess it has very little impact on runtime efficiency except
> when attaching a new package; have you checked that?

It adds one extra element to the search path, so the impact on speed
should be equivalent to loading one additional package (i.e.
negligible)

I've also done some benchmarking to see the impact on calls to
library(). These are now a little outdated (because I've added more
heuristics so I should re-do), but previously conflicted added about
100 ms overhead to a library() call when I had ~170 packages loaded
(the most I could load without running out of dlls).

> I am not so sure about your heuristics.  Can they be disabled, so the
> user is always forced to make the choice?  Even when a function is
> intended to adhere to the superset principle, they don't always get it
> right, so a really careful user should always do explicit disambiguation.

That is a good question - my intuition is always to start with less
user control as it makes it easier to get the core ideas right, and
it's easy to add more control later (whereas if you later take it
away, people get unhappy). Maybe it's natural to have a function that
does the opposite of conflict_prefer(), and declare that something
that doesn't appear to be a conflict actually is?

I don't think that an option to suppress the superset principle
altogether will work - my sense is that it will generate too many
false positives, to the point where you'll get frustrated and stop
using conflicted.

> And of course, if users wrote most of their long scripts as packages
> instead of as long scripts, the ambiguity issue would arise far less
> often, because namespaces in packages are intended to solve the same
> problem as your package does.

Agreed.

> One more comment inline about a typo, possibly in an error message.

Thanks for spotting; fixed in devel now.

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] conflicted: an alternative conflict resolution strategy

2018-08-24 Thread Hadley Wickham

On Fri, Aug 24, 2018 at 4:28 AM Joris Meys  wrote:
>
> Dear Hadley,
>
> There's been some mails from you lately about packages on R-devel. I would 
> argue that the appropriate list for that is R-pkg-devel, as I've been told 
> myself not too long ago. People might get confused and think this is about a 
> change to R itself, which it obviously is not.

The description for R-pkg-devel states:

> This list is to get help about package development in R. The goal of the list 
> is to provide a forum for learning about the package development process. We 
> hope to build a community of R package developers who can help each other 
> solve problems, and reduce some of the burden on the CRAN maintainers. If you 
> are having problems developing a package or passing R CMD check, this is the 
> place to ask!

The description for R-devel states:

> This list is intended for questions and discussion about code development in 
> R. Questions likely to prompt discussion unintelligible to non-programmers or 
> topics that are too technical for R-help's audience should go to R-devel, 
> unless they are specifically about problems in R package development where 
> the R-package-devel list is rather appropriate, see the posting guide 
> section. The main R mailing list is R-help.

My questions are not about how to develop a package, R CMD check, or
how to get it on CRAN, but instead about the semantics of the packages
I am working on. My opinion is supported by the fact that a number of
members of the R core team have responded (both on list and off) and
have not expressed concern about my choice of venue.

That said, I am happy to change venues (or simply not email at all) if
there is widespread concern that my emails are inappropriate.

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Where does L come from?

2018-08-25 Thread Hadley Wickham

Hi all,

Would someone mind pointing to me to the inspiration for the use of
the L suffix to mean "integer"?  This is obviously hard to google for,
and the R language definition
(https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Constants)
is silent.

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] conflicted: an alternative conflict resolution strategy

2018-08-29 Thread Hadley Wickham

>> conflicted applies a few heuristics to minimise false positives (at the
>> cost of introducing a few false negatives). The overarching goal is to
>> ensure that code behaves identically regardless of the order in which
>> packages are attached.
>>
>> -   A number of packages provide a function that appears to conflict
>> with a function in a base package, but they follow the superset
>> principle (i.e. they only extend the API, as explained to me by
>> Hervè Pages).
>>
>> conflicted assumes that packages adhere to the superset principle,
>> which appears to be true in most of the cases that I’ve seen.
>
>
> It seems that you may be able to strengthen this heuristic from a blanket 
> assumption to something more narrowly targeted by looking for one or more of 
> the following to confirm likely-superset adherence
>
> matching or purely extending formals (ie all the named arguments of base::fun 
> match including order, and there are new arguments in pkg::fun only if 
> base::fun takes ...)
> explicit call to  base::fun in the body of pkg::fun
> UseMethod(funname) and at least one provided S3 method calls base::fun
> S4 generic creation using fun or base::fun as the seeding/default method body 
> or called from at least one method

Oooh nice, idea I'll definitely try it out.

>> For
>> example, the lubridate package provides `as.difftime()` and `date()`
>> which extend the behaviour of base functions, and provides S4
>> generics for the set operators.
>>
>> conflict_scout(c("lubridate", "base"))
>> #> 5 conflicts:
>> #> * `as.difftime`: [lubridate]
>> #> * `date`   : [lubridate]
>> #> * `intersect`  : [lubridate]
>> #> * `setdiff`: [lubridate]
>> #> * `union`  : [lubridate]
>>
>> There are two popular functions that don’t adhere to this principle:
>> `dplyr::filter()` and `dplyr::lag()` :(. conflicted handles these
>> special cases so they correctly generate conflicts. (I sure wish I’d
>> know about the subset principle when creating dplyr!)
>>
>> conflict_scout(c("dplyr", "stats"))
>> #> 2 conflicts:
>> #> * `filter`: dplyr, stats
>> #> * `lag`   : dplyr, stats
>>
>> -   Deprecated functions should never win a conflict, so conflicted
>> checks for use of `.Deprecated()`. This rule is very useful when
>> moving functions from one package to another. For example, many
>> devtools functions were moved to usethis, and conflicted ensures
>> that you always get the non-deprecated version, regardess of package
>> attach order:
>
>
> I would completely believe this rule is useful for refactoring as you 
> describe, but that is the "same function" case. For an end-user in the 
> "different function same symbol" case it's not at all clear to me that the 
> deprecated function should always win.
>
> People sometimes use deprecated functions. It's not great, and eventually 
> they'll need to fix that for any given case, but imagine if you deprecated 
> the filter verb in dplyr (I know this will never happen, but I think it's 
> illustrative none the less).
>
> Consider a piece of code someone wrote before this hypothetical deprecation 
> of filter. The fact that it's now deprecated certainly doesn't mean that they 
> secretly wanted stats::filter all along, right? Conflicted acting as if it 
> does will lead to them getting the exact kind of error you're looking to 
> protect them from, and with even less ability to understand why because they 
> are already doing "The right thing" to protect themselves by using conflicted 
> in the first place...

Ah yes, good point. I'll add some heuristic to check that the function
name appears in the first argument of the .Deprecated call (assuming
that the call looks something like `.Deprecated("pkg::foo")`)

>> Finally, as mentioned above, the user can declare preferences:
>>
>> conflict_prefer("select", "MASS")
>> #> [conflicted] Will prefer MASS::select over any other package
>> conflict_scout(c("dplyr", "MASS"))
>> #> 1 conflict:
>> #> * `select`: [MASS]
>>
>
> I deeply worry about people putting this kind of thing, or even just 
> library(conflicted), in their .Rprofile and thus making their scripts 
> substantially less reproducible. Is that a consequence you have thought about 
> to this kind of functionality?

Yes, and I've already recommended against it in two places :)  I'm not
sure if there's any more I can do - people already put (e.g.)
`library(ggplot2)` in their .Rprofile, which is just as bad from a
reproducibility standpoint.

Thanks for the thoughtful feedback!

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Where does L come from?

2018-08-29 Thread Hadley Wickham

Thanks for the great discussion everyone!
Hadley
On Sat, Aug 25, 2018 at 8:26 AM Hadley Wickham  wrote:
>
> Hi all,
>
> Would someone mind pointing to me to the inspiration for the use of
> the L suffix to mean "integer"?  This is obviously hard to google for,
> and the R language definition
> (https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Constants)
> is silent.
>
> Hadley
>
> --
> http://hadley.nz



-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] ROBUSTNESS: x || y and x && y to give warning/error if length(x) != 1 or length(y) != 1

2018-08-30 Thread Hadley Wickham

I think this is an excellent idea as it eliminates a situation which
is almost certainly user error. Making it an error would break a small
amount of existing code (even if for the better), so perhaps it should
start as a warning, but be optionally upgraded to an error. It would
be nice to have a fixed date (R version) in the future when the
default will change to error.

In an ideal world, I think the following four cases should all return
the same error:

if (logical()) 1
#> Error in if (logical()) 1: argument is of length zero
if (c(TRUE, TRUE)) 1
#> Warning in if (c(TRUE, TRUE)) 1: the condition has length > 1 and only the
#> first element will be used
#> [1] 1

logical() || TRUE
#> [1] TRUE
c(TRUE, TRUE) || TRUE
#> [1] TRUE

i.e. I think that `if`, `&&`, and `||` should all check that their
input is a logical (or numeric) vector of length 1.

Hadley

On Tue, Aug 28, 2018 at 10:03 PM Henrik Bengtsson
 wrote:
>
> # Issue
>
> 'x || y' performs 'x[1] || y' for length(x) > 1.  For instance (here
> using R 3.5.1),
>
> > c(TRUE, TRUE) || FALSE
> [1] TRUE
> > c(TRUE, FALSE) || FALSE
> [1] TRUE
> > c(TRUE, NA) || FALSE
> [1] TRUE
> > c(FALSE, TRUE) || FALSE
> [1] FALSE
>
> This property is symmetric in LHS and RHS (i.e. 'y || x' behaves the
> same) and it also applies to 'x && y'.
>
> Note also how the above truncation of 'x' is completely silent -
> there's neither an error nor a warning being produced.
>
>
> # Discussion/Suggestion
>
> Using 'x || y' and 'x && y' with a non-scalar 'x' or 'y' is likely a
> mistake.  Either the code is written assuming 'x' and 'y' are scalars,
> or there is a coding error and vectorized versions 'x | y' and 'x & y'
> were intended.  Should 'x || y' always be considered an mistake if
> 'length(x) != 1' or 'length(y) != 1'?  If so, should it be a warning
> or an error?  For instance,
> '''r
> > x <- c(TRUE, TRUE)
> > y <- FALSE
> > x || y
>
> Error in x || y : applying scalar operator || to non-scalar elements
> Execution halted
>
> What about the case where 'length(x) == 0' or 'length(y) == 0'?  Today
> 'x || y' returns 'NA' in such cases, e.g.
>
> > logical(0) || c(FALSE, NA)
> [1] NA
> > logical(0) || logical(0)
> [1] NA
> > logical(0) && logical(0)
> [1] NA
>
> I don't know the background for this behavior, but I'm sure there is
> an argument behind that one.  Maybe it's simply that '||' and '&&'
> should always return a scalar logical and neither TRUE nor FALSE can
> be returned.
>
> /Henrik
>
> PS. This is in the same vein as
> https://mailman.stat.ethz.ch/pipermail/r-devel/2017-March/073817.html
> - in R (>=3.4.0) we now get that if (1:2 == 1) ... is an error if
> _R_CHECK_LENGTH_1_CONDITION_=true
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] ROBUSTNESS: x || y and x && y to give warning/error if length(x) != 1 or length(y) != 1

2018-08-30 Thread Hadley Wickham

On Thu, Aug 30, 2018 at 10:58 AM Martin Maechler
 wrote:
>
> > Joris Meys
> > on Thu, 30 Aug 2018 14:48:01 +0200 writes:
>
> > On Thu, Aug 30, 2018 at 2:09 PM Dénes Tóth
> >  wrote:
> >> Note that `||` and `&&` have never been symmetric:
> >>
> >> TRUE || stop() # returns TRUE stop() || TRUE # returns an
> >> error
> >>
> >>
> > Fair point. So the suggestion would be to check whether x
> > is of length 1 and whether y is of length 1 only when
> > needed. I.e.
>
> > c(TRUE,FALSE) || TRUE
>
> > would give an error and
>
> > TRUE || c(TRUE, FALSE)
>
> > would pass.
>
> > Thought about it a bit more, and I can't come up with a
> > use case where the first line must pass. So if the short
> > circuiting remains and the extra check only gives a small
> > performance penalty, adding the error could indeed make
> > some bugs more obvious.
>
> I agree "in theory".
> Thank you, Henrik, for bringing it up!
>
> In practice I think we should start having a warning signalled.
> I have checked the source code in the mean time, and the check
> is really very cheap
> { because it can/should be done after checking isNumber(): so
>   then we know we have an atomic and can use XLENGTH() }
>
>
> The 0-length case I don't think we should change as I do find
> NA (is logical!) to be an appropriate logical answer.

Can you explain your reasoning a bit more here? I'd like to understand
the general principle, because from my perspective it's more
parsimonious to say that the inputs to || and && must be length 1,
rather than to say that inputs could be length 0 or length 1, and in
the length 0 case they are replaced with NA.

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] build package with unicode (farsi) strings

2018-08-30 Thread Hadley Wickham

On Thu, Aug 30, 2018 at 2:11 AM Thierry Onkelinx
 wrote:
>
> Dear Farid,
>
> Try using the ASCII notation. letters_fa <- c("\u0627", "\u0641"). The full
> code table is available at https://www.utf8-chartable.de

It's a little easier to do this with code:

letters_fa <- c('الف','ب','پ','ت','ث','ج','چ','ح','خ','ر','ز','د')
writeLines(stringi::stri_escape_unicode(letters_fa))
#> \u0627\u0644\u0641
#> \u0628
#> \u067e
#> \u062a
#> \u062b
#> \u062c
#> \u0686
#> \u062d
#> \u062e
#> \u0631
#> \u0632
#> \u062f

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] True length - length(unclass(x)) - without having to call unclass()?

2018-09-02 Thread Hadley Wickham

For the new vctrs::records class, I implemented length, names, [[, and
[[<- myself in https://github.com/r-lib/vctrs/blob/master/src/fields.c.
That lets me override the default S3 methods while still being able to
access the underlying data that I'm interested in.

Another option that avoids (that you should never discuss in public
😉) is temporarily setting the object bit to FALSE.

In the long run, I think an ALTREP vector that exposes the underlying
data of an S3 object (i.e. sans attributes apart from names) is
probably the way forward.

Hadley
On Fri, Aug 24, 2018 at 1:03 PM Henrik Bengtsson
 wrote:
>
> Is there a low-level function that returns the length of an object 'x'
> - the length that for instance .subset(x) and .subset2(x) see? An
> obvious candidate would be to use:
>
> .length <- function(x) length(unclass(x))
>
> However, I'm concerned that calling unclass(x) may trigger an
> expensive copy internally in some cases.  Is that concern unfounded?
>
> Thxs,
>
> Henrik
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] An update on the vctrs package

2018-11-05 Thread Hadley Wickham

Hi all,

I wanted to give you an update on vctrs ()
since I last bought it up here in August. The biggest change is that I now
have a much clearer idea of what vctrs is! I’ll summarise that here,
and point you to the documentation if you’re interested in learning
more. I’m planning on submitting vctrs to CRAN in the near future, but
it’s very much a 0.1.0 release and I expect it to continue to evolve as
more people try it out and give me feedback. I’d love to hear your
thoughts\!

vctrs has three main goals:

  - To define and motivate `vec_size()` and `vec_type()` as alternatives
to `length()` and `class()`.

  - To define type- and size-stability, useful tools for analysing
function interfaces.

  - To make it easier to create new S3 vector classes.

## Size and prototype

`vec_size()` was motivated by my desire to have a function that captures
the number of “observations” in a vector. This particularly important
for data frames because it’s useful to have a function such that
`f(data.frame(x))` equals `f(x)`. No base function has this property:
`NROW()` comes closest, but because it’s defined in terms of `length()`
for dimensionless objects, it always returns a number, even for types
that can’t go in a data frame, e.g. `data.frame(mean)` errors even
though `NROW(mean)` is `1`.

``` r
vec_size(1:10)
#> [1] 10
vec_size(as.POSIXlt(Sys.time() + 1:10))
#> [1] 10
vec_size(data.frame(x = 1:10))
#> [1] 10
vec_size(array(dim = c(10, 4, 1)))
#> [1] 10
vec_size(mean)
#> Error: `x` is a not a vector
```

`vec_size()` is paired with `vec_slice()` for subsetting, i.e.
`vec_slice()` is to `vec_size()` as `[` is to `length()`;
`vec_slice(data.frame(x), i)` equals `data.frame(vec_slice(x, i))`
(modulo variable/row names).

(I plan to make `vec_size()` and `vec_slice()` generic in the next
release, providing another point of differentiation from `NROW()`.)

Complementary to the size of a vector is its prototype, a
zero-observation slice of the vector. You can compute this using
`vec_type()`, but because many classes don’t have an informative print
method for a zero-length vector, I also provide `vec_ptype()` which
prints a brief summary. As well as the class, the prototype also
captures important attributes:

``` r
vec_ptype(1:10)
#> Prototype: integer
vec_ptype(array(1:40, dim = c(10, 4, 1)))
#> Prototype: integer[,4,1]
vec_ptype(Sys.time())
#> Prototype: datetime
vec_ptype(data.frame(x = 1:10, y = letters[1:10]))
#> Prototype: data.frame<
#>   x: integer
#>   y: factor<5e105>
#> >
```

`vec_size()` and `vec_type()` are accompanied by functions that either
find or enforce a common size (using modified recycling rules) or common
type (by reducing a double-dispatching `vec_type2()` that determines the
common type from a pair of types).

You can read more about `vec_size()` and `vec_type()` at
.

## Stability

The definitions of size and prototype are motivated by my experiences
doing code review. I find that I can often spot problems by running R
code in my head. Obviously my mental R interpreter is much simpler than
the real interpreter, but it seems to focus on prototypes and sizes, and
I’m suspicious of code where I can’t easily predict the class of every
new variable.

This leads me to two definitions. A function is **type-stable** iif:

  - You can predict the output type knowing only the input types.
  - The order of arguments in … does not affect the output type.

Similary, a function is **size-stable** iif:

  - You can predict the output size knowing only the input sizes, or
there is a single numeric input that specifies the output size.

For example, `ifelse()` is type-unstable because the output type can be
different even when the input types are the same:

``` r
vec_ptype(ifelse(NA, 1L, 1L))
#> Prototype: logical
vec_ptype(ifelse(FALSE, 1L, 1L))
#> Prototype: integer
```

Size-stability is generally not a useful for analysing base R functions
because the definition is a bit too far away from base conventions. The
analogously defined length-stability is a bit better, but the definition
of length for non-vectors means that complete length-stability is rare.
For example, while `length(c(x, y))` usually equals `length(x) +
length(y)`, it does not hold for all possible inputs:

``` r
length(globalenv())
#> [1] 0
length(mean)
#> [1] 1
length(c(mean, globalenv()))
#> [1] 2
```

(I don’t mean to pick on base here; the tidyverse also has many
functions that violate these principles, but I wanted to stick to
functions that all readers would be familiar with.)

Type- and size-stable functions are desirable because they make it
possible to reason about code without knowing the precise values
involved. Of course, not all functions should be type- or size-stable: R
would be incredibly limited if you could predict the type or size of
`[[` and `read.csv()` without knowing the specific inputs\! But where
possible, I think using type- and siz

Re: [Rd] Discrepancy between is.list() and is(x, "list")

2019-03-27 Thread Hadley Wickham

I would recommend reading https://adv-r.hadley.nz/base-types.html and
https://adv-r.hadley.nz/s3.html. Understanding the distinction between
base types and S3 classes is very important to make this sort of
question precise, and in my experience, you'll find R easier to
understand if you carefully distinguish between them. (And hence you
shouldn't expect is.x(), inherits(, "x") and is(, "x") to always
return the same results)

Also note that many of is.*() functions are not testing for types or
classes, but instead often have more complex semantics. For example,
is.vector() tests for objects with an underlying base vector type that
have no attributes (apart from names). is.numeric() tests for objects
with base type integer or double, and that have the same algebraic
properties as numbers.

Hadley

On Mon, Mar 25, 2019 at 10:28 PM Abs Spurdle  wrote:
>
> > I have noticed a discrepancy between is.list() and is(x, “list”)
>
> There's a similar problem with inherits().
>
> On R 3.5.3:
>
> > f = function () 1
> > class (f) = "f"
>
> > is.function (f)
> [1] TRUE
> > inherits (f, "function")
> [1] FALSE
>
> I didn't check what happens with:
> > class (f) = c ("f", "function")
>
> However, they should have the same result, regardless.
>
> > Is this discrepancy intentional?
>
> I hope not.
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Discrepancy between is.list() and is(x, "list")

2019-03-28 Thread Hadley Wickham

On Wed, Mar 27, 2019 at 6:27 PM Abs Spurdle  wrote:
>
> > the prison made by ancient design choices
>
> That prison of ancient design choices isn't so bad.
>
> I have no further comments on object oriented semantics.
> However, I'm planning to follow the following design pattern.
>
> If I set the class of an object, I will append the new class to the
> existing class.
>
> #good
> class (object) = c ("something", class (object) )
>
> #bad
> class (object) = "something"
>
> I encourage others to do the same.

I don't think this is a good pattern. It's better to clearly define a
constructor function that checks that `object` is the correct
underlying base type for your class -
https://adv-r.hadley.nz/s3.html#s3-classes.

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)

2019-05-16 Thread Hadley Wickham

The existing behaviour seems inutitive to me. I would consider these
invariants for n vector x_i's each with size m:

* nrow(rbind(x_1, x_2, ..., x_n)) equals n
* ncol(rbind(x_1, x_2, ..., x_n)) equals m

Additionally, wouldn't you expect rbind(x_1[i], x_2[i]) to equal
rbind(x_1, x_2)[, i, drop = FALSE] ?

Hadley

On Thu, May 16, 2019 at 3:26 PM Gabriel Becker  wrote:
>
> Hi all,
>
> Apologies if this has been asked before (a quick google didn't  find it for
> me),and I know this is a case of behaving as documented but its so
> unintuitive (to me at least) that I figured I'd bring it up here anyway. I
> figure its probably going to not be changed,  but I'm happy to submit a
> patch if this is something R-core feels can/should change.
>
> So I recently got bitten by the fact that
>
> > nrow(rbind(character(), character()))
>
> [1] 2
>
>
> I was checking whether the result of an rbind call had more than one row,
> and that unexpected returned true, causing all sorts of shenanigans
> downstream as I'm sure you can imagine.
>
> Now I know that from ?rbind
>
> For ‘cbind’ (‘rbind’), vectors of zero length (including ‘NULL’)
> >
> >  are ignored unless the result would have zero rows (columns), for
> >
> >  S compatibility.  (Zero-extent matrices do not occur in S3 and are
> >
> >  not ignored in R.)
> >
>
> But there's a couple of things here. First, for the rowbind  case this
> reads as "if there would be zero columns,  the vectors will not be
> ignored". This wording implies to me that not ignoring the vectors is a
> remedy to the "problem" of the potential for a zero-column return, but
> thats not the case.  The result still has 0 columns, it just does not also
> have zero rows. So even if the behavior is not changed, perhaps this
> wording can be massaged for clarity?
>
> The other issue, which I admit is likely a problem with my intuition, but
> which I don't think I'm alone in having, is that even if I can't have a 0x0
> matrix (which is what I'd prefer) I would have expected/preferred a 1x0
> matrix, the reasoning being that if we must avoid a 0x0 return value, we
> would do the  minimum required to avoid, which is to not ignore the first
> length 0 vector, to ensure a non-zero-extent matrix, but then ignore the
> remaining ones as they contain information for 0 new rows.
>
> Of course I can program around this now that I know the behavior, but
> again, its so unintuitive (even for someone with a fairly well developed
> intuition for R's sometimes "quirky" behavior) that I figured I'd bring it
> up.
>
> Thoughts?
>
> Best,
> ~G
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Puzzled about a new method for "[".

2019-11-04 Thread Hadley Wickham

For what it's worth, I don't think this strategy can work in general,
because a class might have attributes that depend on its data/contents
(e.g. https://vctrs.r-lib.org/articles/s3-vector.html#cached-sum). I
don't think these are particularly common in practice, but it's
dangerous to assume that you can restore a class simply by restoring
its attributes after subsetting.

Hadley

On Sun, Nov 3, 2019 at 3:11 PM Rolf Turner  wrote:
>
>
> I recently tried to write a new method for "[", to be applied to data
> frames, so that the object returned would retain (all) attributes of the
> columns, including attributes that my code had created.
>
> I thrashed around for quite a while, and then got some help from Rui
> Barradas who showed me how to do it, in the following manner:
>
> `[.myclass` <- function(x, i, j, drop = if (missing(i)) TRUE else
> length(cols) == 1)[{
> SaveAt <- lapply(x, attributes)
> x <- NextMethod()
> lX <- lapply(names(x),function(nm, x, Sat){
>   attributes(x[[nm]]) <- Sat[[nm]]
>   x[[nm]]}, x = x, Sat = SaveAt)
> names(lX) <- names(x)
> x <- as.data.frame(lX)
> x
> }
>
> If I set class(X) <- c("myclass",class(X)) and apply "[" to X (e.g.
> something like X[1:42,]) the attributes are retained as desired.
>
> OK.  All good.  Now we finally come to my question!  I want to put this
> new method into a package that I am building.  When I build the package
> and run R CMD check I get a complaint:
>
> ... no visible binding for global variable ‘cols’
>
> And indeed, there is no such variable.  At first I thought that maybe
> the code should be
>
> `[.myclass` <- function(x, i, j, drop = if (missing(i)) TRUE else
>length(j) == 1)[{
>
> But I looked at "[.data.frame" and it has "cols" too; not "j".
>
> So why doesn't "[.data.frame" throw a warning when R gets built?
>
> Can someone please explain to me what's going on here?
>
> cheers,
>
> Rolf
>
> P. S. I amended the code for my method, replacing "cols" by "j", and it
> *seems* to run, and deliver the desired results.  (And the package
> checks, without complaint.) I am nervous, however, that there may be
> some Trap for Young Players that I don't perceive, lurking about and
> waiting to cause problems for me.
>
> R.
>
> --
> Honorary Research Fellow
> Department of Statistics
> University of Auckland
> Phone: +64-9-373-7599 ext. 88276
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] improving the performance of install.packages

2019-11-09 Thread Hadley Wickham

If this is the behaviour you are looking for, you might like to try
pak (https://pak.r-lib.org)

# Create a temporary library
path <- tempfile()
dir.create(path)
.libPaths(path)

pak::pkg_install("scales")
#> → Will install 8 packages:
#>   colorspace (1.4-1), labeling (0.3), munsell (0.5.0), R6 (2.4.0),
RColorBrewer
#>   (1.1-2), Rcpp (1.0.2), scales (1.0.0), viridisLite (0.3.0)
#>
#> → Will download 2 CRAN packages (4.7 MB), cached: 6 (3.69 MB).
#>
#> ✔ Installed colorspace 1.4-1 [139ms]
#> ✔ Installed labeling 0.3 [206ms]
#> ✔ Installed munsell 0.5.0 [288ms]
#> ✔ Installed R6 2.4.0 [375ms]
#> ✔ Installed RColorBrewer 1.1-2 [423ms]
#> ✔ Installed Rcpp 1.0.2 [472ms]
#> ✔ Installed scales 1.0.0 [511ms]
#> ✔ Installed viridisLite 0.3.0 [569ms]
#> ✔ 1 + 7 pkgs | kept 0, updated 0, new 8 | downloaded 2 (4.7 MB) [2.8s]

pak::pkg_install("scales")
#> ✔ No changes needed
#> ✔ 1 + 7 pkgs | kept 7, updated 0, new 0 | downloaded 0 (0 B) [855ms]

remove.packages(c("Rcpp", "munsell"))
pak::pkg_install("scales")
#> → Will install 2 packages:
#>   munsell (0.5.0), Rcpp (1.0.2)
#>
#> → All 2 packages (4.88 MB) are cached.
#>
#> ✔ Installed munsell 0.5.0 [75ms]
#> ✔ Installed Rcpp 1.0.2 [242ms]
#> ✔ 1 + 7 pkgs | kept 6, updated 0, new 2 | downloaded 0 (0 B) [1.5s]

On Fri, Nov 8, 2019 at 1:07 AM Joshua Bradley  wrote:
>
> Hello,
>
> Currently if you install a package twice:
>
> install.packages("testit")
> install.packages("testit")
>
> R will build the package from source (depending on what OS you're using)
> twice by default. This becomes especially burdensome when people are using
> big packages (i.e. lots of depends) and someone has a script with:
>
> install.packages("tidyverse")
> ...
> ... later on down the script
> ...
> install.packages("dplyr")
>
> In this case, "dplyr" is part of the tidyverse and will install twice. As
> the primary "package manager" for R, it should not install a package twice
> (by default) when it can be so easily checked. Indeed, many people resort
> to writing a few lines of code to filter out already-installed packages An
> r-help post from 2010 proposed a solution to improving the default
> behavior, by adding "force=FALSE" as a api addition to install.packages.(
> https://stat.ethz.ch/pipermail/r-help/2010-May/239492.html)
>
> Would the R-core devs still consider this proposal?
>
> Josh Bradley
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] class() |--> c("matrix", "arrary") [was "head.matrix ..."]

2019-11-14 Thread Hadley Wickham

On Sun, Nov 10, 2019 at 2:37 AM Martin Maechler
 wrote:
>
> > Gabriel Becker
> > on Sat, 2 Nov 2019 12:37:08 -0700 writes:
>
> > I agree that we can be careful and narrow and still see a
> > nice improvement in behavior. While Herve's point is valid
> > and I understand his frustration, I think staying within
> > the matrix vs c(matrix, array) space is the right scope
> > for this work in terms of fiddling with inheritance.
>
>  [.]
>
>
> > > Also, we seem to have a rule that inherits(x, c)  iff  c %in% class(x),
> >
> > good point, and that's why my usage of  inherits(.,.) was not
> > quite to the point.  [OTOH, it was to the point, as indeed from
> >   the ?class / ?inherits docu, S3 method dispatch and inherits
> >   must be consistent ]
> >
> > > which would break -- unless we change class(x) to return the whole
> > set of inherited classes, which I sense that we'd rather not do
>
>   []
>
> > Note again that both "matrix" and "array" are special [see ?class] as
> > being of  __implicit class__  and I am considering that this
> > implicit class behavior for these two should be slightly
> > changed 
> >
> > And indeed I think you are right on spot and this would mean
> > that indeed the implicit class
> > "matrix" should rather become c("matrix", "array").
>
> I've made up my mind (and not been contradicted by my fellow R
> corers) to try go there for  R 4.0.0   next April.

I can't seem to find the previous thread, so would you mind being a
bit more explicit here? Do you mean adding "array" to the implicit
class? Or adding it to the explicit class? Or adding it to inherits?
i.e. which of the following results are you proposing to change?

is_array <- function(x) UseMethod("is_array")
is_array.array <- function(x) TRUE
is_array.default <- function(x) FALSE

x <- matrix()
is_array(x)
#> [1] FALSE
x <- matrix()
inherits(x, "array")
#> [1] FALSE
class(x)
#> [1] "matrix"

It would be nice to make sure this is consistent with the behaviour of
integers, which have an implicit parent class of numeric:

is_numeric <- function(x) UseMethod("is_numeric")
is_numeric.numeric <- function(x) TRUE
is_numeric.default <- function(x) FALSE

x <- 1L
is_numeric(x)
#> [1] TRUE
inherits(x, "numeric")
#> [1] FALSE
class(x)
#> [1] "integer"

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Rebuilding and re-checking of downstream dependencies on CRAN Mac build machines

2020-03-26 Thread Hadley Wickham

If I do install.packages("dplyr", type = "source"), I see:

Installing package into ‘/Users/hadley/R’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/src/contrib/dplyr_0.8.5.tar.gz'
Content type 'application/x-gzip' length 1378766 bytes (1.3 MB)
==
downloaded 1.3 MB

* installing *source* package ‘dplyr’ ...
** package ‘dplyr’ successfully unpacked and MD5 sums checked
** using staged installation
** libs
ccache clang++ -Qunused-arguments
 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG
-I../inst/include -DRCPP_DEFAULT_INCLUDE_CALL=false -DCOMPILING_DPLYR
-DRCPP_USING_UTF8_ERROR_STRING -DRCPP_USE_UNWIND_PROTECT
-DBOOST_NO_AUTO_PTR  -I"/Users/hadley/R/BH/include"
-I"/Users/hadley/R/plogr/include" -I"/Users/hadley/R/Rcpp/include"
-isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk
-I/usr/local/include  -fPIC  -Wall -g -O2  -c RcppExports.cpp -o
RcppExports.o
In file included from RcppExports.cpp:4:
In file included from ./../inst/include/dplyr.h:4:
In file included from ../inst/include/dplyr/main.h:6:
In file included from ../inst/include/dplyr/workarounds/static_assert.h:17:
In file included from /Users/hadley/R/BH/include/boost/config.hpp:57:
In file included from
/Users/hadley/R/BH/include/boost/config/platform/macos.hpp:28:
In file included from
/Users/hadley/R/BH/include/boost/config/detail/posix_features.hpp:18:
In file included from
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/unistd.h:655:
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/gethostuuid.h:39:17:
error: unknown type name 'uuid_t'
int gethostuuid(uuid_t, const struct timespec *)
__OSX_AVAILABLE_STARTING(__MAC_10_5, __IPHONE_NA);
^
In file included from RcppExports.cpp:4:
In file included from ./../inst/include/dplyr.h:4:
In file included from ../inst/include/dplyr/main.h:6:
In file included from ../inst/include/dplyr/workarounds/static_assert.h:17:
In file included from /Users/hadley/R/BH/include/boost/config.hpp:57:
In file included from
/Users/hadley/R/BH/include/boost/config/platform/macos.hpp:28:
In file included from
/Users/hadley/R/BH/include/boost/config/detail/posix_features.hpp:18:
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/unistd.h:662:27:
error: unknown type name 'uuid_t'; did you mean 'uid_t'?
int  getsgroups_np(int *, uuid_t);
  ^
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/sys/_types/_uid_t.h:31:31:
note: 'uid_t' declared here
typedef __darwin_uid_tuid_t;
  ^
In file included from RcppExports.cpp:4:
In file included from ./../inst/include/dplyr.h:4:
In file included from ../inst/include/dplyr/main.h:6:
In file included from ../inst/include/dplyr/workarounds/static_assert.h:17:
In file included from /Users/hadley/R/BH/include/boost/config.hpp:57:
In file included from
/Users/hadley/R/BH/include/boost/config/platform/macos.hpp:28:
In file included from
/Users/hadley/R/BH/include/boost/config/detail/posix_features.hpp:18:
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/unistd.h:664:27:
error: unknown type name 'uuid_t'; did you mean 'uid_t'?
int  getwgroups_np(int *, uuid_t);
  ^
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/sys/_types/_uid_t.h:31:31:
note: 'uid_t' declared here
typedef __darwin_uid_tuid_t;
  ^
In file included from RcppExports.cpp:4:
In file included from ./../inst/include/dplyr.h:4:
In file included from ../inst/include/dplyr/main.h:6:
In file included from ../inst/include/dplyr/workarounds/static_assert.h:17:
In file included from /Users/hadley/R/BH/include/boost/config.hpp:57:
In file included from
/Users/hadley/R/BH/include/boost/config/platform/macos.hpp:28:
In file included from
/Users/hadley/R/BH/include/boost/config/detail/posix_features.hpp:18:
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/unistd.h:727:31:
error: unknown type name 'uuid_t'; did you mean 'uid_t'?
int  setsgroups_np(int, const uuid_t);
  ^
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/sys/_types/_uid_t.h:31:31:
note: 'uid_t' declared here
typedef __darwin_uid_tuid_t;
  ^
In file included from RcppExports.cpp:4:
In file included from ./../inst/include/dplyr.h:4:
In file included from ../inst/include/dplyr/main.h:6:
In file included from ../inst/include/dplyr/workarounds/static_assert.h:17:
In file included from /Users/hadley/R/BH/include/boost/config.hpp:57:
In file included from
/Users/hadley/R/BH/include/boost/config/platform/macos.hpp:28:
In file included from
/Users/hadley/R/BH/include/boost/config/detail/posix_features.hpp:18:
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/unistd.h:729:31:
error: unknown type name 'uuid_t'; did you mean 'uid_t'?
int  setwgroups_np(

Re: [Rd] object.size vs lobstr::obj_size

2020-03-27 Thread Hadley Wickham

On Fri, Mar 27, 2020 at 10:39 AM Hervé Pagès  wrote:

> Hi Tomas,
>
> On 3/27/20 07:01, Tomas Kalibera wrote:
> > they provide an over-approximation
>
> They can also provide an "under-approximation" (to say the least) e.g.
> on reference objects where the entire substance of the object is ignored
> which makes object.size() completely meaningless in that case:
>
>setRefClass("A", fields=c(stuff="ANY"))
>object.size(new("A", stuff=raw(0)))  # 680 bytes
>object.size(new("A", stuff=runif(1e8)))  # 680 bytes
>
> Why wouldn't object.size() look at the content of environments?
>

As the author, I'm obviously biased, but I do like lobstr::obj_sizes()
which allows you to see the additional size occupied by one object given
any number of other objects. This is particularly important for reference
classes since individual objects appear quite large:

A <- setRefClass("A", fields=c(stuff="ANY"))
lobstr::obj_size(new("A", stuff=raw(0)))
#> 567,056 B

But the vast majority is shared across all instances of that class:

lobstr::obj_size(A)
#> 719,232 B
lobstr::obj_sizes(A, new("A", stuff=raw(0)))
#> * 719,232 B
#> * 720 B
lobstr::obj_sizes(A, new("A", stuff=runif(1e8)))
#> * 719,232 B
#> * 800,000,720 B

Hadley
-- 
http://hadley.nz

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] object.size vs lobstr::obj_size

2020-03-27 Thread Hadley Wickham

On Fri, Mar 27, 2020 at 11:08 AM Tomas Kalibera 
wrote:

> On 3/27/20 4:39 PM, Hervé Pagès wrote:
> > Hi Tomas,
> >
> > On 3/27/20 07:01, Tomas Kalibera wrote:
> >> they provide an over-approximation
> >
> > They can also provide an "under-approximation" (to say the least) e.g.
> > on reference objects where the entire substance of the object is
> > ignored which makes object.size() completely meaningless in that case:
> >
> >   setRefClass("A", fields=c(stuff="ANY"))
> >   object.size(new("A", stuff=raw(0)))  # 680 bytes
> >   object.size(new("A", stuff=runif(1e8)))  # 680 bytes
> >
> > Why wouldn't object.size() look at the content of environments?
>
> Yes, the treatment of environments is not "over-approximative". It has
> to be bounded somewhere, you can't traverse all captured environments,
> getting to say package namespaces, global environment, code of all
> functions, that would be too over-approximating. For environments used
> as hash maps that contain data, such as in reference classes, it would
> of course be much better to include them, but you can't differentiate
> programmatically. In principle the same environment can be used for both
> things, say a namespace environment can contain data (not clearly
> related to any user-level R object) as well as code. Not mentioning
> things like source references and parse data.
>
>
I think the heuristic used in lobstr works well in practice: don't traverse
further than the current environment (supplied as an argument so you can
override), and don't ever traverse past the global or base environments.

Hadley

-- 
http://hadley.nz

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] object.size vs lobstr::obj_size

2020-03-27 Thread Hadley Wickham

On Fri, Mar 27, 2020 at 4:01 PM Hervé Pagès  wrote:

>
>
> On 3/27/20 12:00, Hadley Wickham wrote:
> >
> >
> > On Fri, Mar 27, 2020 at 10:39 AM Hervé Pagès  > <mailto:hpa...@fredhutch.org>> wrote:
> >
> > Hi Tomas,
> >
> > On 3/27/20 07:01, Tomas Kalibera wrote:
> >  > they provide an over-approximation
> >
> > They can also provide an "under-approximation" (to say the least)
> e.g.
> > on reference objects where the entire substance of the object is
> > ignored
> > which makes object.size() completely meaningless in that case:
> >
> > setRefClass("A", fields=c(stuff="ANY"))
> > object.size(new("A", stuff=raw(0)))  # 680 bytes
> > object.size(new("A", stuff=runif(1e8)))  # 680 bytes
> >
> > Why wouldn't object.size() look at the content of environments?
> >
> >
> > As the author, I'm obviously biased, but I do like lobstr::obj_sizes()
> > which allows you to see the additional size occupied by one object given
> > any number of other objects. This is particularly important for
> > reference classes since individual objects appear quite large:
> >
> > A <- setRefClass("A", fields=c(stuff="ANY"))
> > lobstr::obj_size(new("A", stuff=raw(0)))
> > #> 567,056 B
> >
> > But the vast majority is shared across all instances of that class:
> >
> > lobstr::obj_size(A)
> > #> 719,232 B
> > lobstr::obj_sizes(A, new("A", stuff=raw(0)))
> > #> * 719,232 B
> > #> * 720 B
> > lobstr::obj_sizes(A, new("A", stuff=runif(1e8)))
> > #> * 719,232 B
> > #> * 800,000,720 B
>
> Nice. Can you clarify the situation with lobstr::obj_size vs
> pryr::object_size? I've heard of the latter before and use it sometimes
> but never heard of the former before seeing Stefan's post. Then I
> checked the authors of both and thought maybe they should talk to each
> other ;-)
>

pryr is basically retired :) TBH I don't know why I gave up on it, except
lobstr is a cooler name 🤣 That's where all active development is
happening. (The underlying code is substantially similar although
lobstr includes bug fixes not present in pryr)

Hadley

-- 
http://hadley.nz

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Package development process?

2010-06-17 Thread Hadley Wickham

> The creation of a research compendium can be viewed as
> a form of unit testing, and the fact that R has powerful tools
> that support this process (Sweave) could be viewed as one of
> its outstanding features (relating these comments back to
> the topic of this thread).

If anything, a research compendium would be an integration test, not a
compendium.  And many programming languages have something similar to
sweave.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] nchar( NA )

2010-06-18 Thread Hadley Wickham

> Value:
>
>     For ‘nchar’, an integer vector giving the sizes of each element,
>     currently always ‘2’ for missing values (for ‘NA’).
>
> It may be unexpected behavior, but it's *well-documented* unexpected behavior.

Oh, that must make it ok then.

For a more sensible take:

> library(stringr)
> str_length(c("", NA))
[1]  0 NA


Hadley


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] proposed change to 'sample'

2010-06-20 Thread Hadley Wickham

> I would be more inclined to make sampling from a vector the normal case,
> and default x to say 1:max(n, size), forcing users to say sample(n=5) if
> sampling from x=1:5 is desired. This could be a manageable change; the
> deprecation sequence is a bit painful to think through, though.

Don't we already have sample.int for that case?

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Catching Ctrl-C/Esc at R-level

2010-06-29 Thread Hadley Wickham

Hi all,

Is it possible to catch when the user presses Ctrl+C/Esc and deal with
it at the R level?

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Documenting non-exported objects

2010-06-29 Thread Hadley Wickham

If I don't export an object saved in data/, do I still need to document it?

Currently it seems like I do.  My namespace file looks like this:

export(evaluate)
export(parse_all)
export(replay)

But when I run R CMD check I get:

* checking for missing documentation entries ... WARNING
Undocumented code objects:
  empty_plot

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Tips for debugging: R CMD check examples

2010-06-29 Thread Hadley Wickham

Hi all,

Does anyone have any suggestions for debugging the execution of
examples by R CMD check?  The examples work fine when I run them from
a live R prompt, but I get errors when they are run by R CMD check.

Thanks,

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Tips for debugging: R CMD check examples

2010-06-30 Thread hadley wickham

> R CMD check produces a foo-Ex.R file where foo is the package name. You
> could start by sourcing that file in R --vanilla and see where it fails
> and also use standard debugging tools in R from there (i.e. drop into a
> debugger on error).

I knew about the foo-Ex.R file, but unfortunately running that
produced no errors. I ended up resorting to inserting print statements
every few lines to narrow down the exact location of the problem,
which revealed that my code assumed options(keep.source = TRUE) but
during R CMD check, options(keep.source = FALSE).

Hopefully this type of problem will be easier to sort out with the new
pure R R CMD check in 2.11.

Hadley


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Tips for debugging: R CMD check examples

2010-06-30 Thread Hadley Wickham

> Not a real tip, but when it occurs I immediately
> check for namespace issues (which often turn out
> to be the origin). Things I do are

I agree that namespace issues are often a problem, and are never easy
to diagnose - I often find that I've forgotten to export a method
definition so other code silently falls back on the default method and
leads to head scratching bugs.

Hadley


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Tips for debugging: R CMD check examples

2010-06-30 Thread hadley wickham

> How did you 'run' it?   I suspect you source()d it, not at all the same
> thing.  I have yet to encounter a problem which
>
> R --vanilla < foo-Ex.R
>
> or perhaps
>
> env LANGUAGE=en R --vanilla --encoding=latin1 < foo-Ex.R
>
> did not reproduce.

Thanks, that was exactly what I was looking for.  I hadn't tried
running it in batch mode.

> Rather, the default (along with many others) differs in interactive use and
> batch running.   If you had used a command like those a few lines about you
> would have had 'keep.source = FALSE' set.

I think the root of the problem is the way that source is written - it
uses a global option in an internal branch, which is difficult to
notice unless you read the source code. It seems like a good principle
that global options should only affect program behaviour by modifying
the function arguments.

>> Hopefully this type of problem will be easier to sort out with the new
>> pure R R CMD check in 2.11.
>
> Maybe (if you meant the one in R-devel, 2.12.0-to-be). But the examples are
> run in a sub-process just as before, so your very rare mis-assumption still
> needs an understanding of how R code is run.

Ooops, yes, I meant 2.12.0-to-be.  I was thinking it would make the
problem easier for me to understand because I'm more inclined to read
R code rather than perl code.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Attributes of 1st argument in ...

2010-07-03 Thread Hadley Wickham

Hi Dan,

Is there a reason you can't change the function to

f <- function(x, ...) {}

?

Hadley

On Fri, Jul 2, 2010 at 4:26 PM, Daniel Murphy  wrote:
> R-Devel:
>
> I am trying to get an attribute of the first argument in a call to a
> function whose formal arguments consist of dots only and do something, e.g.,
> call 'cbind', based on the attribute
> f<- function(...) {get first attribute; maybe or maybe not call 'cbind'}
>
> I thought of (ignoring "deparse.level" for the moment)
>
> f<-function(...) {x <- attr(list(...)[[1L]], "foo"); if (x=="bar")
> cbind(...) else x}
>
> but I feared my solution might do some extra copying, with a performance
> penalty if the dotted objects in the actual call to "f' are very large.
>
> I thought the following alternative might avoid a potential performance hit
> by evaluating the attribute in the parent.frame (and therefore avoid extra
> copying?):
>
> f<-function(...)
> {
>   L<-match.call(expand.dots=FALSE)[[2L]]
>   x <- eval(substitute(attr(x,"foo"), list(x=L[[1L]])))
>   if (x=="bar") cbind(...) else x
> }
>
> system.time tests showed this second form to be only marginally faster.
>
> Is my fear about extra copying unwarranted? If not, is there a better way to
> get the "foo" attribute of the first argument other than my two
> alternatives?
>
> Thanks,
> Dan Murphy
>
>        [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Creating an environment with attributes in a package

2010-07-16 Thread Hadley Wickham

On Fri, Jul 16, 2010 at 2:08 PM, Jon Clayden  wrote:
> Dear all,
>
> I am trying to create an environment object with additional attributes, viz.
>
> Foo <- structure(new.env(), name="Foo")
>
> Doing this in a standard session works fine: I get the environment
> with attr(,"name") set as expected. But if the same code appears
> inside a package source file, I get just the plain environment with no
> attributes set. Using a non-environment object works as I would expect
> within the package (i.e. the attributes remain).
>
> I've looked through the documentation for reasons for this, and the
> only thing I've found is the mention in the language definition that
> "assigning attributes to an environment can lead to surprises". I'm
> not sure if this is one of the surprises that the author(s) had in
> mind! Could someone tell me whether this is expected, please?

You'll be much less surprised if you do:

Foo <- structure(list(new.env()), name="Foo")

Attributes on reference objects are also passed by reference, and
surprises will result.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] parent.frame(1) of a S4 method is not a calling environment.

2010-08-17 Thread Hadley Wickham

On Tuesday, August 17, 2010, Vitaly S.  wrote:
> Duncan Murdoch  writes:
>
>> Vitaly S. wrote:
>>> Martin Morgan  writes:
>>>
> So,  can I be sure that for such functions parent.frame(2) will always 
> work?
> What are the additional rules?
>
 callNextMethod() will cause additional problems; the idea that you'll
 grab things from somewhere other than function arguments doesn't seem
 like a robust design, even if it's used in some important parts of R.

 Martin


>>>
>>> That make it difficult to handle unevaluated expressions in methods. A 
>>> solution
>>> would be to explicitly require the users to use quote() or expression(),  
>>> and
>>> then to use the "expression" in the signature. Slightly unpleasant, though.
>>>
>>>
>>
>> You could use formulas for that.  If you pass in
>>
>> formula = ~ x + y*z
>>
>> then environment(formula) will be the right evaluation environment, and 
>> formula[[2]] will be the unevaluated x +
>> y*z.
>>
>> Duncan Murdoch
>
> Thank you Duncan, I didn't know that.
>
> For programmatic use though, formula interface is slightly inconvenient. A
> specialized function and class would be desirable. With the advent of more and
> more complex S4 classes, unevaluated expressions in methods calls will became 
> a
> necessity, that's my feeling.

I probably should move the quoting related out of plyr into it's own
package to facilitate this type of reuse. I think the current
structure is quite general.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] How do you make a formal "feature" request?

2010-08-21 Thread Hadley Wickham

> A report() function analogous to the plot() function that makes it easy to 
> generate a report from a table of data. This should not be in some auxiliary 
> package, but part of the core R just like plot(). As a long time SAS user I 
> cannot believe R does not have this. Please don't give me any crap about 
> Sweave, LaTex, and the "power" of R to roll your own. You don't have to "roll 
> your own" plot do you? Reports are no different. If you don't agree do not 
> bother me. If you agree then please bring this request to the appropriate 
> authorities for consideration or tell me how to do it.

I know it's frustrating when a program doesn't do what you want to do,
and I agree with you that R's reporting capabilities are not (yet) as
easy to use as SAS (although they can be more powerful and flexible).
You may have noticed that when you downloaded R that you didn't have
to pay anyone any money to use it - this has both advantages and
disadvantages. The advantage are obvious (it's free!), but a
disadvantage is that no-one is funded to work on R full time. This
means that most contributors to R (even the core developers!) work on
R either as part of another job or in their free time, and so usually
work on the aspects of statistics and data analysis that most interest
them. If you want a core developer to work on something, you either
need to get them interested in your problem or find money to buy some
of their time.

Another disadvantage from your perspective is that while SAS is a
business and therefore cares (at least a tiny amount) about your
business, R does not. Everyone who provides help on R-help does so out
of the goodness of their heart, not because they get paid. Replies can
sometimes be rather irascible, and even rude, and while I personally
don't like the lack of manners, you still get far more than what you
pay for.

At the end of the day, if R doesn't meet your needs, why not continue using SAS?

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] No RTFM?

2010-08-21 Thread Hadley Wickham

> previous suggestion by a regular contributor.  I still think a better
> response is not to escalate:  Either ignore the post or say something like,
> "I don't understand your question.  Please provide a self-contained minimal
> example as suggested in the Posting Guide ... ."

I agree wholeheartedly. I have tried to do this with the ggplot2
mailing list, and I think it has been extremely successful in
fostering a community that is friendly and polite, yet still provides
excellent technical support (and these days, most of it doesn't come
from me!).

I know it's frustrating when you see the same "stupid" question asked
over and over and over again, and it's so tempting to reply harshly,
but I think you're far better off just letting it go, and doing
something fun instead.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] No RTFM?

2010-08-21 Thread Hadley Wickham

> Regarding length, the portion at the end of every r-help message (but
> this does not appear at the end of r-devel messages or the messages
> of other lists concerning R):
>
>   "provide commented, minimal, self-contained, reproducible code."
>
> It was intended to provide a one line synopsis of the key part of the posting
> guide that could be readily pointed to.  Although we have to be careful about
> making that too verbose, as well, it might not be too onerous to add

But no one reads email footers...

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] require is to suggests as what is to imports?

2010-08-24 Thread Hadley Wickham

Hi all,

If a package suggests another package in its description, you can
check it at runtime with requires.  How do you do check if a package
is available without loading it, if you only want to access one
function in the package namespace.

Thanks,

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] require is to suggests as what is to imports?

2010-08-25 Thread Hadley Wickham

On Tue, Aug 24, 2010 at 3:50 PM, Prof Brian Ripley
 wrote:
> On Tue, 24 Aug 2010, Hadley Wickham wrote:
>
>> Hi all,
>>
>> If a package suggests another package in its description, you can
>> check it at runtime with requires.  How do you do check if a package
>
> Well, not really as requires() can give an error, at least until 2.12.0 is
> out.  So you need to wrap it in a try/tryCatch construct.
>
>> is available without loading it, if you only want to access one
>> function in the package namespace.
>
> You could use try/tryCatch on pkg::fun (which is what you need to do with
> require).  It is difficult (and would be fragile since the details of
> metadata are definitely subject to change without notice) to ascertain what
> a namespace will contain/export without loading it.

Ok, thanks.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] require is to suggests as what is to imports?

2010-08-25 Thread Hadley Wickham

> But thereby clobber your users with the run-time cost of
> installed.packages() (which can take several minutes on some Windows
> systems, and just took ca 12secs on my fastest Linux server with 3000
> packages installed).  If you want to take this route (is a package
> installed?), see the 'Note' on ?installed.packages for better alternatives.

On that note, I wrote a version of installed.packages() which runs
quite a bit faster on my computer:

installed_packages <- function() {
  paths <- unlist(lapply(.libPaths(), dir, full.names = TRUE))
  desc <- file.path(paths, "DESCRIPTION")
  desc <- desc[file.exists(desc)]

  dcf <- lapply(desc, read.dcf, fields = c("Package", "Title", "Version"))
  packages <- as.data.frame(do.call("rbind", dcf), stringsAsFactors = FALSE)

  packages$status <- ifelse(packages$Package %in% .packages(),
"loaded", "installed")
  class(packages) <- c("packages", class(packages))
  packages[order(packages$Package), ]
}

It probably runs faster because I've eliminated some features, and
it's probably not worth spending much time optimising such a rarely
used function, but there it is for what it's worth.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] require is to suggests as what is to imports?

2010-08-25 Thread Hadley Wickham

On Tue, Aug 24, 2010 at 6:55 PM, Henrik Bengtsson  
wrote:
> isPackageInstalled <- function(package, ...) {
>  path <- system.file(package=package);
>  (path != "");
> }
>
> taken from R.utils (which also has a isPackageLoaded()).

Nice quick hack (subject to caveats Brian mentions).  Thanks!

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] NEWS and readNEWS

2010-08-27 Thread Hadley Wickham

readNEWS() states:

 Read R's ‘NEWS’ file or a similarly formatted one.  This is an
 experimental feature, new in R 2.4.0 and may change in several
 ways

and news() also indicates that this tool is supposed to work with
non-R news files.  However, I've not been able to get readNEWS to read
a package news file, even when following the format indicated in
news().  Looking at the code for readNEWS() it seems there are couple
of places where it assumes it's working with the main R NEWS file:

  * s.post <- " SERIES NEWS"
  * s.pre <- "^[\t ]*CHANGES IN R VERSION "

Is this a bug or is the documentation incorrect?

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] NEWS and readNEWS

2010-08-30 Thread Hadley Wickham

>>   * s.post <- " SERIES NEWS"
>>   * s.pre <- "^[\t ]*CHANGES IN R VERSION "
>
>> Is this a bug or is the documentation incorrect?
>
> readNEWS() is really for such 3-level files, but then R itself will move
> to an 2-level Rd format in R 2.12.0.
>
> Use news() to read package news files.

Ah, ok, this is rather unclear from the current documentation.  So if
I want to read a NEWS file off disk to check that I have the right
format (without installing the package first), I should use
tools:::.build_news_db ?

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] S3 method for package listed in suggest/enhance

2010-09-01 Thread Hadley Wickham

Hi all,

The profr package provides a method for displaying its output with
ggplot: ggplot.print.  You don't need this ggplot2 to use profr, so
ggplot2 is listed under enhances in the DESCRIPTION file.

If I have just S3method(ggplot, profr) in my NAMESPACE, then I get:

** testing if installed package can be loaded
Error : object 'ggplot' not found whilst loading namespace 'profr'
ERROR: loading failed

If I have both S3method(ggplot, profr) and importFrom(ggplot2,
ggplot), then I get:

* checking package dependencies ... ERROR
Namespace dependency not required: ggplot2

What's the correct way of exporting an S3 method for a generic in a
suggested package?

Thanks,

Hadley



-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] License of R manuals

2010-09-29 Thread Hadley Wickham

Hi all,

Under what license are the R manuals (R language definition etc)
released?  They are not mentioned explicitly in license() and have no
license information in the individual documents.  Does this mean that
they are released under GPL-2?  If so, what does that mean, given that
they aren't software?

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] License of R manuals

2010-09-29 Thread Hadley Wickham

> Hmm, well... I have always understood it so that: (a) yes, it's GPL-2 (what 
> else could it be) and (b) it means that the restrictions of GPL apply insofar 
> as they make sense, e.g., you can pick it apart and reuse it in other GPL-2 
> or compatible products, but not take it proprietary. Upon request, 
> distributors should probably be prepared to deliver a machine-readable 
> version of the source code. However, there is no requirement of attribution, 
> as with some of the CC licenses.
>
> By and large, I think this makes sense for technical documentation files. 
> E.g., the help file for poisson.test has stretches of text copied verbatim 
> from binom.test, and it would be ridiculous if such cross-pollination would 
> require that Peter, the author of poisson.test should put in a statement that 
> some of the text was borrowed from binom.test, by Kurt. (In this particular 
> case, both are (c) R Foundation, but you get the point.)
>
> For more extensive free-standing documents, there might be a point in using a 
> CC/FDL-style license instead. However, these licenses appear to be GPL 
> INcompatible, so some care is required.  Until now, the GPL plus Common 
> Courtesy has worked well enough.

Ok - great.  I ask because I've been working on a brief introduction
to S3 that has been adapted from the R language definition -
http://github.com/hadley/devtools/wiki/S3. I've included a note giving
the source and stating that its licensed under GPL-2.  Does that sound
sufficient?

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Eval and the enclos argument

2010-10-02 Thread Hadley Wickham

Hi all,

I'm trying to understand the default value of the enclos argument of eval:

  enclos = if(is.list(envir) || is.pairlist(envir)) parent.frame()
else baseenv()

Why isn't it just

  enclos = parent.frame()

given that enclos is only meaningful (given my reading of the
documentation) when envir is not an environment already.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Eval and the enclos argument

2010-10-02 Thread Hadley Wickham

On Sat, Oct 2, 2010 at 8:18 AM, Duncan Murdoch  wrote:
> On 02/10/2010 7:57 AM, Hadley Wickham wrote:
>>
>> Hi all,
>>
>> I'm trying to understand the default value of the enclos argument of eval:
>>
>>  enclos = if(is.list(envir) || is.pairlist(envir)) parent.frame()
>> else baseenv()
>>
>> Why isn't it just
>>
>>  enclos = parent.frame()
>>
>> given that enclos is only meaningful (given my reading of the
>> documentation) when envir is not an environment already.
>>
>> Hadley
>>
>
>
> I think that handles the case of envir=NULL.

So that makes eval(expr, NULL) equivalent to eval(expr, baseenv()), right?

Hadley


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] What do you call the value that represents a missing argument?

2010-10-08 Thread Hadley Wickham

Hi all,

What's the official name for the value that represents a missing argument?

e.g.
formals(plot)$x
str(formals(plot)$x)
deparse(formals(plot)$x)
is.symbol(formals(plot)$x)

What's the correct way to create an object like this?  (for example if
you are manipulating the formals of a function to add an argument with
no default value, as in http://stackoverflow.com/questions/3892580/).
as.symbol("") returns an error.  Both substitute() and bquote() return
that object, but it's not obvious if this is on purpose.

Hadley


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] What do you call the value that represents a missing argument?

2010-10-09 Thread Hadley Wickham

> It is a 'dotted pair list'

But:

> is.pairlist(formals(plot)$x)
[1] FALSE

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Rbuildignore and mercurial

2010-10-27 Thread Hadley Wickham

>  I've changed to Mercurial for my working copies of survival for a number or 
> resons not relevant to this post.  When I do R CMD check, I get some warnings
> about certain files in the .hg directory with odd names.  I've added the 
> following 2 lines to my .Rbuildignore file without effect
> ^\.hg$
> ^\.hg.*

Unfortunately that only affects building, not checking.  You have two
options: (1) ignore the warning (2) check the built package.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Roxygen: @example tag does not work for me

2010-11-04 Thread Hadley Wickham

> I thought that @example would take the R code in "tests/foo.R" (this file
> also exists) and append it to the .Rd-file. However, there is no
> \examples{...} section in my roxygen-processed .Rd-file after running
> roxygenize(). It just seems as if @example is just neglected. Should I put
> the file in another directory?

I would suspect that the path would be relative to either man/ or R/ -
so you probably want ../tests/...

But including your unit tests as examples seems like a pretty odd
thing to do - they do serve rather different purposes.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] col2rgb feature request

2010-11-08 Thread Hadley Wickham

Small feature request for col2rgb: could it preserve missing values? Currently:

> col2rgb(NA)
  [,1]
red255
green  255
blue   255

Which is frustrating when you're working with vectors of colors that
might include transparent colours.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] SEXPs and slots

2010-11-15 Thread Hadley Wickham

> 2.  Any good references/resources for developing R?  Nearly all the
> documents I've found are for programming R as a user, not as a developer.  I
> have copies of the documentation, which are very helpful, but it'd be
> helpful to have additional resources to fill in their gaps.

The best advice I've received is to use Rcpp:
http://stackoverflow.com/questions/4106174/where-can-i-learn-to-how-to-write-c-code-to-speed-up-slow-r-functions.
 It provides a consistent api for C <-> R conversions and so is much
easier to learn.  Dirk and Romain are also churning out the
documentation, so there are lots of examples to learn from.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Create NAMESPACE file as 'package.skeleton()' would do

2010-11-15 Thread Hadley Wickham

If you're using roxygenise, explicitly tag functions that you want to
export with @export.
Hadley

On Mon, Nov 15, 2010 at 5:11 PM, Janko Thyson
 wrote:
> Hi there,
>
>
>
> is there a way to create a NAMESPACE file based on Rd-files (or whatever is
> needed in order to apply the regular expression "^[[:alpha:]]+" without(!)
> resorting to package.skeleton() (as this kind of interferes with
> roxygenize() pretty often)?
>
>
>
> Thanks a lot,
>
> Janko
>
>
>        [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

1 2 3 4 5 6 7 8 >

1 - 100 of 774 matches

Mail list logo