I don't like the idea of having a length-1 dim attribute trigger some behavior of sample. (Should a length-2 dim cause it to sample rows of a matrix, as unique() and duplicated() do?).
S+'s sample() had another argument, 'n', that could be used to specify the size of the population to sample from. It had to be a single nonnegative integral number and only one of the 'x' and 'n' arguments could be supplied. This was not optimal, but the help file discouraged the use of the 'x' argument and encouraged the use of subscripting with sample()'s output instead of having sample() do the subscripting. S+'s rsample() (called by sample()) only had the 'n' argument, you could not input the population to sample from. It also separated sampling from shuffling, which is handy when taking large samples from huge populations - shuffling the output often took most of the time. The S+ argument lists are: sample(x, size = n, replace = F, prob = NULL, n = NULL, ...) rsample(n, size = n, replace = F, prob = NULL, bigdata = F, minimal = NULL, ..., order = T) Bill Dunlap TIBCO Software wdunlap tibco.com On Wed, Jun 17, 2015 at 7:18 AM, Radford Neal <radf...@cs.toronto.edu> wrote: > > Then the question would be if this test could be replaced with a new > > argument to sample, e.g. expandSingle, which has TRUE as default for > > backward compatibility, but FALSE if you dont want population to be > > expanded to 1:population. It could certainly be useful in some cases, > > but you still need to know about the expansion to use it. I think most > > of these bugs occur because users did not think about the expansion in > > the first place or did not realize that their population could be of > > length 1 in some situations. These users would therefore not think about > > changing the argument either. > > I think the solution might be to make sample always treat the first > argument as the vector to sample from if it has a "dim" attribute that > explicitly specifies that it is a one-dimensional array. The effect > of this would be that sample(10,1) would sample from 1:10, as at > present, but sample(array(10),1) would sample from the length-one > vector with element 10 (and hence always return 10). > > With this change, you can easily ensure that sample(v,1) always samples > from v even when it has length one by rewriting it to sample(array(v),1). > > It's of course possible that some existing code relies on the current > behaviour, but probably not much existing code, since one-dimensional > arrays are (I think) not very common at present. > > A bigger gain would come if one also introduced a new sequence operator > that creates a sequence that is marked as a one-dimensional array, which > would be part of a solution to several other problems as well, as I > propose at http://www.cs.utoronto.ca/~radford/ftp/R-lang-ext.pdf > > Radford Neal > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel