Hello,
Thanks, I hadn't thought of that.
But, why? Is it evaluated once before assignment and a second time when
the assignment occurs?
To trace both sample and `[<-` gives 2 calls to sample.
trace(sample)
trace(`[<-`)
df[sample(nrow(df), 3),]$treated <- TRUE
trace: sample(nrow(df), 3)
trace: `[<-`(`*tmp*`, sample(nrow(df), 3), , value = list(unit = c(7L,
6L, 8L), treated = c(TRUE, TRUE, TRUE)))
trace: sample(nrow(df), 3)
Regards,
Rui Barradas
Às 17:20 de 19/06/2020, William Dunlap escreveu:
The first subscript argument is getting evaluated twice.
> trace(sample)
> set.seed(2020); df[i<-sample(10,3), ]$Treated <- TRUE
trace: sample(10, 3)
trace: sample(10, 3)
> i
[1] 1 10 4
> set.seed(2020); sample(10,3)
trace: sample(10, 3)
[1] 7 6 8
> sample(10,3)
trace: sample(10, 3)
[1] 1 10 4
Bill Dunlap
TIBCO Software
wdunlap tibco.com <http://tibco.com>
On Fri, Jun 19, 2020 at 8:46 AM Rui Barradas <ruipbarra...@sapo.pt
<mailto:ruipbarra...@sapo.pt>> wrote:
Hello,
I don't have an answer on the reason why this happens but it seems
like
a bug. Where?
In which of `[<-.data.frame` or `[<-.default`?
A solution is to subset and assign the vector:
set.seed(2020)
df2 <- data.frame(unit = 1:10)
df2$treated <- FALSE
df2$treated[sample(nrow(df2), 3)] <- TRUE
df2
# unit treated
#1 1 FALSE
#2 2 FALSE
#3 3 FALSE
#4 4 FALSE
#5 5 FALSE
#6 6 TRUE
#7 7 TRUE
#8 8 TRUE
#9 9 FALSE
#10 10 FALSE
Or
set.seed(2020)
df3 <- data.frame(unit = 1:10)
df3$treated <- FALSE
df3[sample(nrow(df3), 3), "treated"] <- TRUE
df3
# result as expected
Hope this helps,
Rui Barradas
Às 13:49 de 19/06/2020, Sébastien Lahaie escreveu:
> I ran into some strange behavior in R when trying to assign a
treatment to
> rows in a data frame. I'm wondering whether any R experts can
explain
> what's going on.
>
> First, let's assign a treatment to 3 out of 10 rows as follows.
>
>> df <- data.frame(unit = 1:10)
>> df$treated <- FALSE
>> s <- sample(nrow(df), 3)
>> df[s,]$treated <- TRUE
>> df
> unit treated
>
> 1 1 FALSE
>
> 2 2 TRUE
>
> 3 3 FALSE
>
> 4 4 FALSE
>
> 5 5 TRUE
>
> 6 6 FALSE
>
> 7 7 TRUE
>
> 8 8 FALSE
>
> 9 9 FALSE
>
> 10 10 FALSE
>
> This is as expected. Now we'll just skip the intermediate step
of saving
> the sampled indices, and apply the treatment directly as follows.
>
>> df <- data.frame(unit = 1:10)
>> df$treated <- FALSE
>> df[sample(nrow(df), 3),]$treated <- TRUE
>> df
> unit treated
>
> 1 6 TRUE
>
> 2 2 FALSE
>
> 3 3 FALSE
>
> 4 9 TRUE
>
> 5 5 FALSE
>
> 6 6 FALSE
>
> 7 7 FALSE
>
> 8 5 TRUE
>
> 9 9 FALSE
>
> 10 10 FALSE
>
> Now the data frame still has 10 rows with 3 assigned to the
treatment. But
> the units are garbled. Units 1 and 4 have disappeared, for
instance, and
> there are duplicates for 6 and 9, one assigned to treatment and
the other
> to control. Why would this happen?
>
> Thanks,
> Sebastien
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org <mailto:R-help@r-project.org> mailing list
-- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Este e-mail foi verificado em termos de vírus pelo software
antivírus Avast.
https://www.avast.com/antivirus
______________________________________________
R-help@r-project.org <mailto:R-help@r-project.org> mailing list --
To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Este e-mail foi verificado em termos de vírus pelo software antivírus Avast.
https://www.avast.com/antivirus
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.