Re: [R] How to generate SE for the proportion value using a randomization process in R?

Rui Barradas Thu, 28 Jan 2021 12:22:05 -0800

Hello,

I don't know why you would need to see the indices but rewrite thefunction bootprop as


bootprop_ind <- function(data, index){
  d <- data[index, ]
  #sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
  index
}

and call in the same way. It will now return a matrix of indices with R= 1000 rows and 19 columns.


Hope this helps,

Rui Barradas


Às 19:29 de 28/01/21, Marna Wagley escreveu:

Hi Rui,
I am sorry for asking you several questions.

In the given example, randomizations (reshuffle) were done 1000 times,and its 1000 proportion values (results) are stored and it can be seenusing b$t; but I was wondering how the table was randomized (which rowshave been missed/or repeated in each randomizing procedure?).

Is there any way we can see the randomized table and its associatedresults? Here in this example, I randomized (or bootstrapped) the tableinto three times (R=3) so I would like to store these three tables andlook at them later to know which rows were repeated/missed. Is there anypossibility?

The example data and the code is given below.

Thank you for your help.

####
library(boot)
dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L, 16L,
17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label = c("id1",
"id10", "id11", "id12", "id13", "id14", "id15", "id16", "id17",
"id18", "id19", "Id2", "id3", "id4", "id5", "id6", "id7", "id8",
"id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L, 0L, 0L,
1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 = c(1L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,

1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class = "data.frame",row.names = c(NA,

-19L))
daT<-data.frame(dat %>%
   mutate(Time1.but.not.in.Time2 = case_when(
             Time1 %in% "1" & Time2 %in% "0"  ~ "1"),
Time2.but.not.in.Time1 = case_when(
             Time1 %in% "0" & Time2 %in% "1"  ~ "1"),
  BothTimes = case_when(
             Time1 %in% "1" & Time2 %in% "1"  ~ "1")))

cols.num <- c("Time1.but.not.in.Time2","Time2.but.not.in.Time1","BothTimes")

daT[cols.num] <- sapply(daT[cols.num],as.numeric)
summary(daT)

bootprop <- function(data, index){
    d <- data[index, ]
    sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
}

R <- 3
set.seed(2020)
b <- boot(daT, bootprop, R)
b
b$t0     # original
b$t
sd(b$t)  # bootstrapped estimate of the SE of the sample prop.
hist(b$t, freq = FALSE)

str(b)
b$data
b$seed
b$sim
b$strata
################

On Sat, Jan 23, 2021 at 12:36 AM Marna Wagley <marna.wag...@gmail.com<mailto:marna.wag...@gmail.com>> wrote:


    Yes Rui, I can see we don't need to divide by square root of sample
    size. The example is great to understand it.
    Thank you.
    Marna


    On Sat, Jan 23, 2021 at 12:28 AM Rui Barradas <ruipbarra...@sapo.pt
    <mailto:ruipbarra...@sapo.pt>> wrote:

        Hello,

        Inline.

        Às 07:47 de 23/01/21, Marna Wagley escreveu:
         > Dear Rui,
         > I was wondering whether we have to square root of SD to find
        SE, right?

        No, we don't. var already divides by n, don't divide again.
        This is the code, that can be seen by running the function name
        at a
        command line.


        sd
        #function (x, na.rm = FALSE)
        #sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x),
        #    na.rm = na.rm))
        #<bytecode: 0x55f3ce900848>
        #<environment: namespace:stats>



         >
         > bootprop <- function(data, index){
         >     d <- data[index, ]
         >     sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]],
        na.rm = TRUE)
         > }
         >
         > R <- 1e3
         > set.seed(2020)
         > b <- boot(daT, bootprop, R)
         > b
         > b$t0     # original
         > sd(b$t)  # bootstrapped estimate of the SE of the sample prop.
         > sd(b$t)/sqrt(1000)
         > pandit*(1-pandit)
         >
         > hist(b$t, freq = FALSE)


        Try plotting the normal densities for both cases, the red line is
        clearly wrong.


        f <- function(x, xbar, s){
            dnorm(x, mean = xbar, sd = s)
        }

        hist(b$t, freq = FALSE)
        curve(f(x, xbar = b$t0, s = sd(b$t)), from = 0, to = 1, col =
        "blue",
        add = TRUE)
        curve(f(x, xbar = b$t0, s = sd(b$t)/sqrt(R)), from = 0, to = 1,
        col =
        "red", add = TRUE)


        Hope this helps,

        Rui Barradas

         >
         >
         >
         >
         > On Fri, Jan 22, 2021 at 3:07 PM Rui Barradas
        <ruipbarra...@sapo.pt <mailto:ruipbarra...@sapo.pt>
         > <mailto:ruipbarra...@sapo.pt <mailto:ruipbarra...@sapo.pt>>>
        wrote:
         >
         >     Hello,
         >
         >     Something like this, using base package boot?
         >
         >
         >     library(boot)
         >
         >     bootprop <- function(data, index){
         >         d <- data[index, ]
         >         sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]],
        na.rm = TRUE)
         >     }
         >
         >     R <- 1e3
         >     set.seed(2020)
         >     b <- boot(daT, bootprop, R)
         >     b
         >     b$t0     # original
         >     sd(b$t)  # bootstrapped estimate of the SE of the sample
        prop.
         >     hist(b$t, freq = FALSE)
         >
         >
         >     Hope this helps,
         >
         >     Rui Barradas
         >
         >     Às 21:57 de 22/01/21, Marna Wagley escreveu:
         >      > Hi All,
         >      > I was trying to estimate standard error (SE) for the
        proportion
         >     value using
         >      > some kind of randomization process (bootstrapping or
        jackknifing)
         >     in R, but
         >      > I could not figure it out.
         >      >
         >      > Is there any way to generate SE for the proportion?
         >      >
         >      > The example of the data and the code I am using is
        attached for your
         >      > reference. I would like to generate the value of
        proportion with
         >     a SE using
         >      > a 1000 times randomization.
         >      >
         >      > dat<-structure(list(Sample = structure(c(1L, 12L, 13L,
        14L, 15L, 16L,
         >      > 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
        11L), .Label
         >     = c("id1",
         >      > "id10", "id11", "id12", "id13", "id14", "id15",
        "id16", "id17",
         >      > "id18", "id19", "Id2", "id3", "id4", "id5", "id6",
        "id7", "id8",
         >      > "id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L,
        0L, 0L,
         >      > 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L),
        Time2 = c(1L,
         >      > 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L,
        0L, 1L, 0L,
         >      > 1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class =
         >     "data.frame",
         >      > row.names = c(NA,
         >      > -19L))
         >      > daT<-data.frame(dat %>%
         >      >    mutate(Time1.but.not.in.Time2 = case_when(
         >      >              Time1 %in% "1" & Time2 %in% "0"  ~ "1"),
         >      > Time2.but.not.in.Time1 = case_when(
         >      >              Time1 %in% "0" & Time2 %in% "1"  ~ "1"),
         >      >   BothTimes = case_when(
         >      >              Time1 %in% "1" & Time2 %in% "1"  ~ "1")))
         >      >   daT
         >      >   summary(daT)
         >      >
         >      > cols.num <-
        c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
         >      > "BothTimes")
         >      > daT[cols.num] <- sapply(daT[cols.num],as.numeric)
         >      > summary(daT)
         >      > ProportionValue<-sum(daT$BothTimes,
        na.rm=T)/sum(daT$Time1, na.rm=T)
         >      > ProportionValue
         >      > standard error??
         >      >
         >      >       [[alternative HTML version deleted]]
         >      >
         >      > ______________________________________________
         >      > R-help@r-project.org <mailto:R-help@r-project.org>
        <mailto:R-help@r-project.org <mailto:R-help@r-project.org>>
        mailing list
         >     -- To UNSUBSCRIBE and more, see
         >      > https://stat.ethz.ch/mailman/listinfo/r-help
         >      > PLEASE do read the posting guide
         > http://www.R-project.org/posting-guide.html
         >      > and provide commented, minimal, self-contained,
        reproducible code.
         >      >
         >


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to generate SE for the proportion value using a randomization process in R?

Reply via email to