Hi Rui,
I am sorry for asking you several questions.
In the given example, randomizations (reshuffle) were done 1000 times,
and its 1000 proportion values (results) are stored and it can be seen
using b$t; but I was wondering how the table was randomized (which rows
have been missed/or repeated in each randomizing procedure?).
Is there any way we can see the randomized table and its associated
results? Here in this example, I randomized (or bootstrapped) the table
into three times (R=3) so I would like to store these three tables and
look at them later to know which rows were repeated/missed. Is there any
possibility?
The example data and the code is given below.
Thank you for your help.
####
library(boot)
dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L, 16L,
17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label = c("id1",
"id10", "id11", "id12", "id13", "id14", "id15", "id16", "id17",
"id18", "id19", "Id2", "id3", "id4", "id5", "id6", "id7", "id8",
"id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L, 0L, 0L,
1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 = c(1L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class = "data.frame",
row.names = c(NA,
-19L))
daT<-data.frame(dat %>%
mutate(Time1.but.not.in.Time2 = case_when(
Time1 %in% "1" & Time2 %in% "0" ~ "1"),
Time2.but.not.in.Time1 = case_when(
Time1 %in% "0" & Time2 %in% "1" ~ "1"),
BothTimes = case_when(
Time1 %in% "1" & Time2 %in% "1" ~ "1")))
cols.num <- c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
"BothTimes")
daT[cols.num] <- sapply(daT[cols.num],as.numeric)
summary(daT)
bootprop <- function(data, index){
d <- data[index, ]
sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
}
R <- 3
set.seed(2020)
b <- boot(daT, bootprop, R)
b
b$t0 # original
b$t
sd(b$t) # bootstrapped estimate of the SE of the sample prop.
hist(b$t, freq = FALSE)
str(b)
b$data
b$seed
b$sim
b$strata
################
On Sat, Jan 23, 2021 at 12:36 AM Marna Wagley <marna.wag...@gmail.com
<mailto:marna.wag...@gmail.com>> wrote:
Yes Rui, I can see we don't need to divide by square root of sample
size. The example is great to understand it.
Thank you.
Marna
On Sat, Jan 23, 2021 at 12:28 AM Rui Barradas <ruipbarra...@sapo.pt
<mailto:ruipbarra...@sapo.pt>> wrote:
Hello,
Inline.
Às 07:47 de 23/01/21, Marna Wagley escreveu:
> Dear Rui,
> I was wondering whether we have to square root of SD to find
SE, right?
No, we don't. var already divides by n, don't divide again.
This is the code, that can be seen by running the function name
at a
command line.
sd
#function (x, na.rm = FALSE)
#sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x),
# na.rm = na.rm))
#<bytecode: 0x55f3ce900848>
#<environment: namespace:stats>
>
> bootprop <- function(data, index){
> d <- data[index, ]
> sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]],
na.rm = TRUE)
> }
>
> R <- 1e3
> set.seed(2020)
> b <- boot(daT, bootprop, R)
> b
> b$t0 # original
> sd(b$t) # bootstrapped estimate of the SE of the sample prop.
> sd(b$t)/sqrt(1000)
> pandit*(1-pandit)
>
> hist(b$t, freq = FALSE)
Try plotting the normal densities for both cases, the red line is
clearly wrong.
f <- function(x, xbar, s){
dnorm(x, mean = xbar, sd = s)
}
hist(b$t, freq = FALSE)
curve(f(x, xbar = b$t0, s = sd(b$t)), from = 0, to = 1, col =
"blue",
add = TRUE)
curve(f(x, xbar = b$t0, s = sd(b$t)/sqrt(R)), from = 0, to = 1,
col =
"red", add = TRUE)
Hope this helps,
Rui Barradas
>
>
>
>
> On Fri, Jan 22, 2021 at 3:07 PM Rui Barradas
<ruipbarra...@sapo.pt <mailto:ruipbarra...@sapo.pt>
> <mailto:ruipbarra...@sapo.pt <mailto:ruipbarra...@sapo.pt>>>
wrote:
>
> Hello,
>
> Something like this, using base package boot?
>
>
> library(boot)
>
> bootprop <- function(data, index){
> d <- data[index, ]
> sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]],
na.rm = TRUE)
> }
>
> R <- 1e3
> set.seed(2020)
> b <- boot(daT, bootprop, R)
> b
> b$t0 # original
> sd(b$t) # bootstrapped estimate of the SE of the sample
prop.
> hist(b$t, freq = FALSE)
>
>
> Hope this helps,
>
> Rui Barradas
>
> Às 21:57 de 22/01/21, Marna Wagley escreveu:
> > Hi All,
> > I was trying to estimate standard error (SE) for the
proportion
> value using
> > some kind of randomization process (bootstrapping or
jackknifing)
> in R, but
> > I could not figure it out.
> >
> > Is there any way to generate SE for the proportion?
> >
> > The example of the data and the code I am using is
attached for your
> > reference. I would like to generate the value of
proportion with
> a SE using
> > a 1000 times randomization.
> >
> > dat<-structure(list(Sample = structure(c(1L, 12L, 13L,
14L, 15L, 16L,
> > 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
11L), .Label
> = c("id1",
> > "id10", "id11", "id12", "id13", "id14", "id15",
"id16", "id17",
> > "id18", "id19", "Id2", "id3", "id4", "id5", "id6",
"id7", "id8",
> > "id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L,
0L, 0L,
> > 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L),
Time2 = c(1L,
> > 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L,
0L, 1L, 0L,
> > 1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class =
> "data.frame",
> > row.names = c(NA,
> > -19L))
> > daT<-data.frame(dat %>%
> > mutate(Time1.but.not.in.Time2 = case_when(
> > Time1 %in% "1" & Time2 %in% "0" ~ "1"),
> > Time2.but.not.in.Time1 = case_when(
> > Time1 %in% "0" & Time2 %in% "1" ~ "1"),
> > BothTimes = case_when(
> > Time1 %in% "1" & Time2 %in% "1" ~ "1")))
> > daT
> > summary(daT)
> >
> > cols.num <-
c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
> > "BothTimes")
> > daT[cols.num] <- sapply(daT[cols.num],as.numeric)
> > summary(daT)
> > ProportionValue<-sum(daT$BothTimes,
na.rm=T)/sum(daT$Time1, na.rm=T)
> > ProportionValue
> > standard error??
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org <mailto:R-help@r-project.org>
<mailto:R-help@r-project.org <mailto:R-help@r-project.org>>
mailing list
> -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained,
reproducible code.
> >
>