Re: [R] How to generate SE for the proportion value using a randomization process in R?

Rui Barradas Thu, 28 Jan 2021 14:47:39 -0800

Hello,

Yes, sorry for my previous post, I had forgotten about boot.array.
That's a much better solution for your problem.


Rui Barradas

Às 20:29 de 28/01/21, Marna Wagley escreveu:

Thank you Rui,
This is great. How about the following?

SimilatedData<-boot.array(b, indices=T)

seems it is giving the rows ID which are used in the calculation, isn't it?

On Thu, Jan 28, 2021 at 12:21 PM Rui Barradas <ruipbarra...@sapo.pt<mailto:ruipbarra...@sapo.pt>> wrote:


    Hello,

    I don't know why you would need to see the indices but rewrite the
    function bootprop as

    bootprop_ind <- function(data, index){
        d <- data[index, ]
        #sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
        index
    }


    and call in the same way. It will now return a matrix of indices with R
    = 1000 rows and 19 columns.

    Hope this helps,

    Rui Barradas


    Às 19:29 de 28/01/21, Marna Wagley escreveu:
     > Hi Rui,
     > I am sorry for asking you several questions.
     >
     > In the given example, randomizations (reshuffle) were done 1000
    times,
     > and its 1000 proportion values (results) are stored and it can be
    seen
     > using b$t; but I was wondering how the table was randomized
    (which rows
     > have been missed/or repeated in each randomizing procedure?).
     >
     > Is there any way we can see the randomized table and its associated
     > results? Here in this example, I randomized (or bootstrapped) the
    table
     > into three times (R=3) so I would like to store these three
    tables and
     > look at them later to know which rows were repeated/missed. Is
    there any
     > possibility?
     > The example data and the code is given below.
     >
     > Thank you for your help.
     >
     > ####
     > library(boot)
     > dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L, 16L,
     > 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label
    = c("id1",
     > "id10", "id11", "id12", "id13", "id14", "id15", "id16", "id17",
     > "id18", "id19", "Id2", "id3", "id4", "id5", "id6", "id7", "id8",
     > "id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L, 0L, 0L,
     > 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 = c(1L,
     > 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
     > 1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class =
    "data.frame",
     > row.names = c(NA,
     > -19L))
     > daT<-data.frame(dat %>%
     >    mutate(Time1.but.not.in.Time2 = case_when(
     >              Time1 %in% "1" & Time2 %in% "0"  ~ "1"),
     > Time2.but.not.in.Time1 = case_when(
     >              Time1 %in% "0" & Time2 %in% "1"  ~ "1"),
     >   BothTimes = case_when(
     >              Time1 %in% "1" & Time2 %in% "1"  ~ "1")))
     > cols.num <- c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
     > "BothTimes")
     > daT[cols.num] <- sapply(daT[cols.num],as.numeric)
     > summary(daT)
     >
     > bootprop <- function(data, index){
     >     d <- data[index, ]
     >     sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm =
    TRUE)
     > }
     >
     > R <- 3
     > set.seed(2020)
     > b <- boot(daT, bootprop, R)
     > b
     > b$t0     # original
     > b$t
     > sd(b$t)  # bootstrapped estimate of the SE of the sample prop.
     > hist(b$t, freq = FALSE)
     >
     > str(b)
     > b$data
     > b$seed
     > b$sim
     > b$strata
     > ################
     >
     >
     > On Sat, Jan 23, 2021 at 12:36 AM Marna Wagley
    <marna.wag...@gmail.com <mailto:marna.wag...@gmail.com>
     > <mailto:marna.wag...@gmail.com <mailto:marna.wag...@gmail.com>>>
    wrote:
     >
     >     Yes Rui, I can see we don't need to divide by square root of
    sample
     >     size. The example is great to understand it.
     >     Thank you.
     >     Marna
     >
     >
     >     On Sat, Jan 23, 2021 at 12:28 AM Rui Barradas
    <ruipbarra...@sapo.pt <mailto:ruipbarra...@sapo.pt>
     >     <mailto:ruipbarra...@sapo.pt <mailto:ruipbarra...@sapo.pt>>>
    wrote:
     >
     >         Hello,
     >
     >         Inline.
     >
     >         Às 07:47 de 23/01/21, Marna Wagley escreveu:
     >          > Dear Rui,
     >          > I was wondering whether we have to square root of SD
    to find
     >         SE, right?
     >
     >         No, we don't. var already divides by n, don't divide again.
     >         This is the code, that can be seen by running the
    function name
     >         at a
     >         command line.
     >
     >
     >         sd
     >         #function (x, na.rm = FALSE)
     >         #sqrt(var(if (is.vector(x) || is.factor(x)) x else
    as.double(x),
     >         #    na.rm = na.rm))
     >         #<bytecode: 0x55f3ce900848>
     >         #<environment: namespace:stats>
     >
     >
     >
     >          >
     >          > bootprop <- function(data, index){
     >          >     d <- data[index, ]
     >          >     sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]],
     >         na.rm = TRUE)
     >          > }
     >          >
     >          > R <- 1e3
     >          > set.seed(2020)
     >          > b <- boot(daT, bootprop, R)
     >          > b
     >          > b$t0     # original
     >          > sd(b$t)  # bootstrapped estimate of the SE of the
    sample prop.
     >          > sd(b$t)/sqrt(1000)
     >          > pandit*(1-pandit)
     >          >
     >          > hist(b$t, freq = FALSE)
     >
     >
     >         Try plotting the normal densities for both cases, the red
    line is
     >         clearly wrong.
     >
     >
     >         f <- function(x, xbar, s){
     >             dnorm(x, mean = xbar, sd = s)
     >         }
     >
     >         hist(b$t, freq = FALSE)
     >         curve(f(x, xbar = b$t0, s = sd(b$t)), from = 0, to = 1, col =
     >         "blue",
     >         add = TRUE)
     >         curve(f(x, xbar = b$t0, s = sd(b$t)/sqrt(R)), from = 0,
    to = 1,
     >         col =
     >         "red", add = TRUE)
     >
     >
     >         Hope this helps,
     >
     >         Rui Barradas
     >
     >          >
     >          >
     >          >
     >          >
     >          > On Fri, Jan 22, 2021 at 3:07 PM Rui Barradas
     >         <ruipbarra...@sapo.pt <mailto:ruipbarra...@sapo.pt>
    <mailto:ruipbarra...@sapo.pt <mailto:ruipbarra...@sapo.pt>>
     >          > <mailto:ruipbarra...@sapo.pt
    <mailto:ruipbarra...@sapo.pt> <mailto:ruipbarra...@sapo.pt
    <mailto:ruipbarra...@sapo.pt>>>>
     >         wrote:
     >          >
     >          >     Hello,
     >          >
     >          >     Something like this, using base package boot?
     >          >
     >          >
     >          >     library(boot)
     >          >
     >          >     bootprop <- function(data, index){
     >          >         d <- data[index, ]
     >          >         sum(d[["BothTimes"]], na.rm =
    TRUE)/sum(d[["Time1"]],
     >         na.rm = TRUE)
     >          >     }
     >          >
     >          >     R <- 1e3
     >          >     set.seed(2020)
     >          >     b <- boot(daT, bootprop, R)
     >          >     b
     >          >     b$t0     # original
     >          >     sd(b$t)  # bootstrapped estimate of the SE of the
    sample
     >         prop.
     >          >     hist(b$t, freq = FALSE)
     >          >
     >          >
     >          >     Hope this helps,
     >          >
     >          >     Rui Barradas
     >          >
     >          >     Às 21:57 de 22/01/21, Marna Wagley escreveu:
     >          >      > Hi All,
     >          >      > I was trying to estimate standard error (SE)
    for the
     >         proportion
     >          >     value using
     >          >      > some kind of randomization process
    (bootstrapping or
     >         jackknifing)
     >          >     in R, but
     >          >      > I could not figure it out.
     >          >      >
     >          >      > Is there any way to generate SE for the proportion?
     >          >      >
     >          >      > The example of the data and the code I am using is
     >         attached for your
     >          >      > reference. I would like to generate the value of
     >         proportion with
     >          >     a SE using
     >          >      > a 1000 times randomization.
     >          >      >
     >          >      > dat<-structure(list(Sample = structure(c(1L,
    12L, 13L,
     >         14L, 15L, 16L,
     >          >      > 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
     >         11L), .Label
     >          >     = c("id1",
     >          >      > "id10", "id11", "id12", "id13", "id14", "id15",
     >         "id16", "id17",
     >          >      > "id18", "id19", "Id2", "id3", "id4", "id5", "id6",
     >         "id7", "id8",
     >          >      > "id9"), class = "factor"), Time1 = c(0L, 1L,
    1L, 1L,
     >         0L, 0L,
     >          >      > 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L,
    0L),
     >         Time2 = c(1L,
     >          >      > 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L,
     >         0L, 1L, 0L,
     >          >      > 1L, 1L)), .Names = c("Sample", "Time1",
    "Time2"), class =
     >          >     "data.frame",
     >          >      > row.names = c(NA,
     >          >      > -19L))
     >          >      > daT<-data.frame(dat %>%
     >          >      >    mutate(Time1.but.not.in.Time2 = case_when(
     >          >      >              Time1 %in% "1" & Time2 %in% "0"  ~
    "1"),
     >          >      > Time2.but.not.in.Time1 = case_when(
     >          >      >              Time1 %in% "0" & Time2 %in% "1"  ~
    "1"),
     >          >      >   BothTimes = case_when(
     >          >      >              Time1 %in% "1" & Time2 %in% "1"  ~
    "1")))
     >          >      >   daT
     >          >      >   summary(daT)
     >          >      >
     >          >      > cols.num <-
     >         c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
     >          >      > "BothTimes")
     >          >      > daT[cols.num] <- sapply(daT[cols.num],as.numeric)
     >          >      > summary(daT)
     >          >      > ProportionValue<-sum(daT$BothTimes,
     >         na.rm=T)/sum(daT$Time1, na.rm=T)
     >          >      > ProportionValue
     >          >      > standard error??
     >          >      >
     >          >      >       [[alternative HTML version deleted]]
     >          >      >
     >          >      > ______________________________________________
     >          >      > R-help@r-project.org
    <mailto:R-help@r-project.org> <mailto:R-help@r-project.org
    <mailto:R-help@r-project.org>>
     >         <mailto:R-help@r-project.org
    <mailto:R-help@r-project.org> <mailto:R-help@r-project.org
    <mailto:R-help@r-project.org>>>
     >         mailing list
     >          >     -- To UNSUBSCRIBE and more, see
     >          >      > https://stat.ethz.ch/mailman/listinfo/r-help
     >          >      > PLEASE do read the posting guide
     >          > http://www.R-project.org/posting-guide.html
     >          >      > and provide commented, minimal, self-contained,
     >         reproducible code.
     >          >      >
     >          >
     >


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to generate SE for the proportion value using a randomization process in R?

Reply via email to