On 6/16/2015 1:32 PM, Peter Meissner wrote:
Am .06.2015, 14:55 Uhr, schrieb Millot Gael <gael.mil...@curie.fr>:
Hi.
I have a problem with the default behavior of sample(), which performs
sample(1:x) when x is a single value.
This behavior is well explained in ?sample.
However, this behavior is annoying when the number of value is not
predictable. Would it be possible to add an argument
that desactivates this and perform the sampling on a single value ?
Examples:
sample(10, size = 1, replace = FALSE)
10
sample(10, size = 3, replace = TRUE)
10 10 10
sample(10, size = 3, replace = FALSE)
Error
I think the problem here is that the function actually does what you
would expect it to do given a statistic perspective. A sample of size
three from a population of one without allowing to draw elements again
that were drawn already is simply not defined. What shall the function
give back?
If I understand right, this error is exactly what the poster would like
to see, but which you dont get currently. If length(population) == 1,
you will now sample from 1:population, not the population itself. So:
> sample(8:10, 3, replace = FALSE)
[1] 10 8 9
> sample(9:10, 3, replace = FALSE)
Error in sample.int(length(x), size, replace, prob) :
cannot take a sample larger than the population when 'replace = FALSE'
> sample(10:10, 3, replace = FALSE)
[1] 8 10 2
I have to admit that I also find this behaviour inconsistent, even if it
is well described already on the first line of the details in the
documentation. It is definitely a feature which can cause some trouble,
and where the tests might end up more complicated than you would first
think.
... You can always wrap your code in a try() like this to prevent errors
to break loops or functions:
try(sample(...))
No error is given when length(population) == 1, and the result might be
perfectly valid if population is variable. So this will easily stay in
the script as an undetected bug.
... or you might check your arguments before execution:
if ( !replace & length(population) >= size ){
sample(population, size = size , replace = replace)
}else{
...
}
This test is not sufficient if length(population) == size == 1, so you
will also need to check for this special case:
if (length(population) == 1 & size == 1) {
population
} else if (!replace & length(population) >= size) {
sample(population, size = size, replace = replace)
} else {
...
}
Then the question would be if this test could be replaced with a new
argument to sample, e.g. expandSingle, which has TRUE as default for
backward compatibility, but FALSE if you dont want population to be
expanded to 1:population. It could certainly be useful in some cases,
but you still need to know about the expansion to use it. I think most
of these bugs occur because users did not think about the expansion in
the first place or did not realize that their population could be of
length 1 in some situations. These users would therefore not think about
changing the argument either.
Cheers,
Jon
Many thanks for your help.
Best wishes,
Gael Millot.
Gael Millot
UMR 3244 (IC-CNRS-UPMC) et Universite Pierre et Marie Curie
Equipe Recombinaison et instabilite genetique
Pav Trouillet Rossignol 5eme etage
Institut Curie
26 rue d'Ulm
75248 Paris Cedex 05
FRANCE
tel : 33 1 56 24 66 34
fax : 33 1 56 24 66 44
Email : gael.mil...@curie.fr
http://perso.curie.fr/Gael.Millot/index.html
[[alternative HTML version deleted]]
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Best, Peter
--
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel