Hi, After running the bootstrapping, I would like to the output of the bootstrapped samples. How can I view the bootstrapped samples of each variable?
Bryan Mac bryanmac...@gmail.com > On Oct 18, 2016, at 3:57 AM, Rui Barradas <ruipbarra...@sapo.pt> wrote: > > It means that the sd of the bootstrap samples is 0.21. > See function ?boot.ci for confidence intervals. > You should also start a new thread in R-Help, you will have more and better > answers. > > Em 18-10-2016 08:15, Bryan Mac escreveu: >> Hi Rui, >> >> I am having trouble understanding what this means exactly? Does this >> mean that the bootstrapped number is +/-0.21 from the original? >> >> >> How would i show all of the t’s in the bootstrap? I have about t1 to t28 >> so far. Would it be possible to show all of them? > > I don't understand what you mean by this. All of the results are returned and > printed by boot(). > > Rui Barradas >> >> Bryan Mac >> bryanmac...@gmail.com <mailto:bryanmac...@gmail.com> >> >> >> >>> On Oct 7, 2016, at 1:41 PM, ruipbarra...@sapo.pt >>> <mailto:ruipbarra...@sapo.pt> wrote: >>> >>> Hello, >>> >>> That's just the definition of a function, you have to actually call >>> it, in a call to boot(9, for instance. >>> >>> >>> OLSCoef_NAR_NIC <- function(df, indices){ >>> sample <- df[indices, ] >>> OLS_NAR_NIC_relation <- lm(NAR ~ NIC, data = sample) >>> coef_ols_nar_nic <- coef(OLS_NAR_NIC_relation) >>> coef_ols_nar_nic >>> } >>> >>> boot(n_data, statistic = OLSCoef_NAR_NIC, R = 100) >>> >>> ORDINARY NONPARAMETRIC BOOTSTRAP >>> >>> >>> Call: >>> boot(data = n_data, statistic = OLSCoef_NAR_NIC, R = 100) >>> >>> >>> Bootstrap Statistics : >>> original bias std. error >>> t1* 1.8788189 -0.013771706 0.59596631 >>> t2* 0.5003911 0.002478478 0.09016857 >>> >>> >>> As for the output in the format you want, I sugest you call lm(9, with >>> your entire df, since it is big there's no reason to bootstrap it. >>> Something like this: >>> >>> > model <- lm(NAR ~ NIC, data = data) >>> > summary(model) >>> >>> Call: >>> lm(formula = NAR ~ NIC, data = data) >>> >>> Residuals: >>> Min 1Q Median 3Q Max >>> -6.0459 -1.1916 0.2126 1.3424 4.8094 >>> >>> Coefficients: >>> Estimate Std. Error t value Pr(>|t|) >>> (Intercept) 1.66395 0.18859 8.823 <2e-16 *** >>> NIC 0.56384 0.02588 21.783 <2e-16 *** >>> --- >>> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 >>> >>> Residual standard error: 1.886 on 1267 degrees of freedom >>> Multiple R-squared: 0.2725, Adjusted R-squared: 0.2719 >>> F-statistic: 474.5 on 1 and 1267 DF, p-value: < 2.2e-16 >>> >>> Rui Barradas >>> >>> Citando Bryan Mac <bryanmac...@gmail.com <mailto:bryanmac...@gmail.com>>: >>> >>>> By the way, when I ran the code, i didn’t see any output of results. >>>> >>>> This is what I got. >>>> <OLS.PNG> >>>> Bryan Mac >>>> bryanmac...@gmail.com <mailto:bryanmac...@gmail.com> >>>>> On Oct 6, 2016, at 3:48 AM, ruipbarra...@sapo.pt >>>>> <mailto:ruipbarra...@sapo.pt> wrote: >>> >>> Hello, >>> >>> I believe that your code is correct, I don't understand what you mean >>> by "not showing up". >>> If you want the coefficients, residuals, etc, your bootstrap statistic >>> function needs to return those values. You can, for instance, use a >>> different function, one to return the r-squared, another to return the >>> coefficients or t.value, etc. >>> >>> This function would return the coefficients. Note that if you use the >>> argument data = ... you don't need the name of the df in your formula. >>> It makes the code more readable. >>> >>> >>> OLSCoef_NAR_NIC <- function(df, indices){ >>> sample <- df[indices, ] >>> OLS_NAR_NIC_relation <- lm(NAR ~ NIC, data = sample) >>> coef_ols_nar_nic <- coef(OLS_NAR_NIC_relation) >>> coef_ols_nar_nic >>> } >>> >>> Rui Barradas >>> >>> Citando Bryan Mac <bryanmac...@gmail.com <mailto:bryanmac...@gmail.com>>: >>> >>>> Hi Rui, >>>> >>>> My next steps is to run both Least Median Square Regression and >>>> Ordinary Least Square Regression after the bootstrap. >>>> Me and my colleague wrote the code for it. I am having doubts that it >>>> is correct. Is this how you compete the OLS and LMS Regression? >>>> Doesn’t my output have to model the sample below? I believe I do have >>>> the code that can model it but its not showing up, but i do not see >>>> the residuals or the coefficients (estimate/std. error,t.value,etc.) >>>> Sample Code: >>>> Call: >>>> ## lm(formula = crime ~ poverty + single, data = cdata) >>>> ## >>>> ## Residuals: >>>> ## Min 1Q Median 3Q Max >>>> ## -811.1 -114.3 -22.4 121.9 689.8 >>>> ## >>>> ## Coefficients: >>>> ## Estimate Std. Error t value Pr(>|t|) >>>> ## (Intercept) -1368.19 187.21 -7.31 2.5e-09 *** >>>> ## poverty 6.79 8.99 0.76 0.45 >>>> ## single 166.37 19.42 8.57 3.1e-11 *** >>>> ## --- >>>> ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 >>>> ## >>>> ## Residual standard error: 244 on 48 degrees of freedom >>>> ## Multiple R-squared: 0.707, Adjusted R-squared: 0.695 >>>> ## F-statistic: 58 on 2 and 48 DF, p-value: 1.58e-13 >>>> This is our code: >>>> OLSRegression <- function(df, indices){ >>>> sample <- df[indices, ] >>>> OLS_NAR_NIC_relation <- lm(sample$NAR~sample$NIC, data = sample) >>>> rsquared_ols_nar_nic <- summary(OLS_NAR_NIC_relation)$r.square >>>> >>>> >>>> OLS_SQRTNAR_SQRTNIC_relation <- lm(sample$SQRTNAR~sample$SQRTNIC, >>>> data = sample) >>>> rsquared_ols_sqrtnar_sqrtnic <- >>>> summary(OLS_SQRTNAR_SQRTNIC_relation)$r.square >>>> >>>> >>>> out <- c(rsquared_ols_nar_nic, rsquared_ols_sqrtnar_sqrtnic) >>>> return(out) >>>> } >>>> LMSRegression <- function(df, indices){ >>>> sample <- df[indices, ] >>>> LMS_NAR_NIC_relation <- lm(sample$NAR~sample$NIC, data = sample, >>>> method = "lms") >>>> rsquared_lms_nar_nic <- summary(LMS_NAR_NIC_relation)$r.square >>>> >>>> >>>> LMS_SQRTNAR_SQRTNIC_relation <- lm(sample$SQRTNAR~sample$SQRTNIC, >>>> data = sample, method = "lms") >>>> rsquared_lms_sqrtnar_sqrtnic <- >>>> summary(LMS_SQRTNAR_SQRTNIC_relation)$r.square >>>> >>>> >>>> out <- c(rsquared_lms_nar_nic, rsquared_lms_sqrtnar_sqrtnic) >>>> return(out) >>>> } >>>> boot.out.ols <- boot(n_data, statistic = OLSRegression, R = 100) >>>> boot.out.ols >>>> plot(boot.out.ols, index = 1) >>>> title(sub = "Histogram and Q-Q plot for relation between NAR-NIC >>>> (OLS; R-Squared Value)", line = 4) >>>> plot(boot.out.ols, index = 2) >>>> title(sub = "Histogram and Q-Q plot for relation between >>>> SQRTNAR-SQRTNIC (OLS; R-Squared Value)", line = 4) >>>> ci_ols_1 <- boot.ci(boot.out.ols, index = 1, type = "all") >>>> ci_ols_1 >>>> ci_ols_filtered_1 <- ci_ols_1$bca[, c(4,5)] >>>> ci_ols_filtered_1 >>>> hist(boot.out.ols$t[,1], main = 'Determination of Coefficient: >>>> NAR-NIC', xlab = 'R-Squared', col = 'LightBlue', probability = T) >>>> lines(density(boot.out.ols$t[,1]), col = 'Red') >>>> abline(v = ci_ols_filtered_1, col = 'brown') >>>> ci_ols_2 <- boot.ci(boot.out.ols, index = 2, type = "all") >>>> ci_ols_2 >>>> ci_ols_filtered_2 <- ci_ols_2$bca[, c(4,5)] >>>> ci_ols_filtered_2 >>>> hist(boot.out.ols$t[,2], main = 'Determination of Coefficient: >>>> SQRTNAR-SQRTNIC', xlab = 'R-Squared', col = 'LightBlue', probability = T) >>>> lines(density(boot.out.ols$t[,2]), col = 'Red') >>>> abline(v = ci_ols_filtered_2, col = 'brown') >>>> boot.out.lms <- boot(n_data, statistic = LMSRegression, R = 100) >>>> boot.out.lms >>>> plot(boot.out.lms, index = 1) >>>> title(sub = "Histogram and Q-Q plot for relation between NAR-NIC >>>> (OLS; R-Squared Value)", line = 4) >>>> plot(boot.out.lms, index = 2) >>>> title(sub = "Histogram and Q-Q plot for relation between >>>> SQRTNAR-SQRTNIC (OLS; R-Squared Value)", line = 4) >>>> ci_lms_1<- boot.ci(boot.out.lms, index = 1, type = "all") >>>> ci_lms_1 >>>> ci_lms_filtered_1 <- ci_lms_1$bca[, c(4,5)] >>>> ci_lms_filtered_1 >>>> hist(boot.out.lms$t[,1], main = 'Determination of Coefficient: >>>> NAR-NIC', xlab = 'R-Squared', col = 'LightBlue', probability = T) >>>> lines(density(boot.out.lms$t[,1]), col = 'Red') >>>> abline(v = ci_ols_filtered_1, col = 'brown') >>>> ci_lms_2<- boot.ci(boot.out.lms, index = 2, type = "all") >>>> ci_lms_2 >>>> ci_lms_filtered_2 <- ci_lms_2$bca[, c(4,5)] >>>> ci_lms_filtered_2 >>>> hist(boot.out.lms$t[,2], main = 'Determination of Coefficient: >>>> SQRTNAR-SQRTNIC', xlab = 'R-Squared', col = 'LightBlue', probability = T) >>>> lines(density(boot.out.lms$t[,2]), col = 'Red') >>>> abline(v = ci_ols_filtered_2, col = 'brown') >>>> Bryan Mac >>>> bryanmac...@gmail.com <mailto:bryanmac...@gmail.com> >>>>> On Oct 5, 2016, at 3:27 AM, ruipbarra...@sapo.pt >>>>> <mailto:ruipbarra...@sapo.pt> wrote: >>> >>> Hello, >>> >>> Inline. >>> >>> Citando Bryan Mac <bryanmac...@gmail.com <mailto:bryanmac...@gmail.com>>: >>> >>>> Hi Rui, Thanks. >>>> >>>> About this part of the code, I thought because we are bootstrapping >>>> which is random sample WITH replacement, it would be replace=TRUE ? >>>> Or is it replace=FALSE because its not trying to replace the values >>>> in the columns, but just trying to randomly call 100 cases out of the >>>> total? >>>> >>>> Yes, you said that you want to select a sub-df and _then_ bootstrap >>>> it, so you should choose it without replacement, it's the bootstrap >>>> that uses sampling with replacement. >>>> >>>> Rui Barradas >>>> ix <- sample(1269, 100, replace = FALSE) >>>> n_|data <- data[ix, cols]| >>>> Also, I got no errors as well. Thanks. >>>> Best, >>>> Bryan Mac >>>> bryanmac...@gmail.com <mailto:bryanmac...@gmail.com> >>>>> On Oct 4, 2016, at 3:56 AM, ruipbarra...@sapo.pt >>>>> <mailto:ruipbarra...@sapo.pt> wrote: >>> >>> Two more things. >>> >>> 1) Don't call your df data or df, those are names of R functions. >>> 2) I've just ran boot(data, statistic = DataSummary, R = 100), with >>> the 1269 rows, and it gave me no error. >>> >>> Rui Barradas >>> >>> Citando Bryan Mac <bryanmac...@gmail.com <mailto:bryanmac...@gmail.com>>: >>> >>>> Hi Rui, >>>> >>>> Its for a project that I am dealing with at work. It has to do with >>>> estimation of advertisement performance. >>>> What the code has to accomplish is to randomly select 100 cases each >>>> time it is run and bootstrap it 100 times. >>>> It can’t be just only the first 100 cases of the 1269 rows. It can be >>>> anywhere between the first row to 1269 row. >>>> I think for now what I am asking help on is, is there a functional >>>> code where I will randomly select 100 rows out of my total (1269)? >>>> Where each time it is run, you get different df/DataSummary and >>>> bootstrap sample. >>>> I think i need to edit this to achieve my purpose of randomly >>>> selecting 100 rows out of my total >>>> |cols <- c('NAR','SQRTNAR','NIC','SQRTNIC') >>>> data[,cols] <- lapply(data[,cols],as.numeric) #to convert the variables >>>> into numeric values if not. >>>> n_data <- data[(1:100),cols]| >>>> I wanted to look at the trend if I increased the number of >>>> bootstrapped samples (i.e.. 100, 200, 300, etc.) When i increased the >>>> bootstrapped sample, the distribution got exponentially larger. >>>> I thought that due to random sampling/bootstrapping you would get a >>>> variation of scores. >>>> I ran the df through the df through DataSummary and the bootstrap >>>> results; I compared them and they are identical results. >>>> By the way, i kept getting errors when I did 100 bootstrap samples >>>> and had 1269 rows. It said that the sample was too small. >>>> Bryan Mac >>>> bryanmac...@gmail.com <mailto:bryanmac...@gmail.com> >>>> P.S. I am attaching an excel fie to show you what I mean. I >>>> essentially randomly choose 100 cases out of total in the NAR column. >>>> Once randomly selecting those 100 cases, bootstrap it 100 times. >>>> Thats what I am looking to do. >>> >>> >>> >>> >>> >>> >>> >>> >> ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.