> -----Original Message----- > From: Peter Dalgaard [mailto:pda...@gmail.com] > Sent: Sunday, June 20, 2010 2:12 PM > To: William Dunlap > Cc: Patrick Burns; r-devel@r-project.org > Subject: Re: [Rd] proposed change to 'sample' > > William Dunlap wrote: > >> -----Original Message----- > >> From: r-devel-boun...@r-project.org > >> [mailto:r-devel-boun...@r-project.org] On Behalf Of Patrick Burns > .... > >> > >> I propose adding an argument that allows > >> the user (programmer) to avoid that > >> ambiguity: > >> > >> function (x, size, replace = FALSE, prob = NULL, > >> max = length(x) == 1L && is.numeric(x) && x >= 1) > > > > S+'s sample() has an argument 'n' to achieve > > the same result. It has been there since at > > least 2005 (S+ 7.0.6). sample(n=n) means to > > return a sample from seq_along(n), where n must > > be a scalar nonnegative integer. sample(x=x) > > retains it old ambiguous meaning. > > sample(x, size = n, replace = F, prob = NULL, n = NULL, ...) > > Hmm, that doesn't really solve the issue does it? I.e., you > still cannot > conveniently sample from a vector that is possibly of size 1. > > I would be more inclined to make sampling from a vector the > normal case, > and default x to say 1:max(n, size), forcing users to say > sample(n=5) if > sampling from x=1:5 is desired. This could be a manageable change; the > deprecation sequence is a bit painful to think through, though.
I think that the breaking of old code was why we allowed the user to use an unambiguous sample(n=n), but didn't change how sample(x=scalar) worked. Internally, we had long discouraged using sample(x=vector) because of the ambiguity problem, preferring x[sample(length(x),...)]. I notice that S+'s rsample() does not allow sampling from a vector, only from seq_len(n). I think that is because it was felt that sampling rows from a data.frame (or the bigdata equivalent, bdframe) was a more common operation and the code was simpler/faster if rsample didn't have to call out to possible subscripting methods. Relaxing the requirement that the output be a randomly permuted sample was a bigger requirement when dealing with long datasets. In any case, I was just stating that if sample were changed to allow disambiguation of its first argument, using 'n' instead of 'max' would be compatible with S+. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > > -- > Peter Dalgaard > Center for Statistics, Copenhagen Business School > Phone: (+45)38153501 > Email: pd....@cbs.dk Priv: pda...@gmail.com > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel