Re: [R] Multiple regressions with changing dependent variable and time span

nooldor Sun, 01 Dec 2013 19:31:05 -0800

Yes. Thank you once again. With this:

> res2 <- do.call(cbind,lapply(lst2,
>  function(x)
> rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z);
> if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1);
> c(coef(l1), pval=summary(l1)$coef[,4], rsquare=summary(l1)$r.squared) }
> else rep(NA,9)},by.column=FALSE,align="right")))
> dim(res2)
> #[1]  123 3006
>  334*9
> #[1] 3006
>


I can delete unnecessary columns and have only those I need.
BIG thank!
Regards,

T.S.



On 2 December 2013 04:14, arun <[email protected]> wrote:

> Hi,
> I guess you wanted something like this:
>
> res2 <- do.call(cbind,lapply(lst2,
>  function(x)
> rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z);
> if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1);
> c(coef(l1), pval=summary(l1)$coef[,4], rsquare=summary(l1)$r.squared) }
> else rep(NA,9)},by.column=FALSE,align="right")))
> dim(res2)
> #[1]  123 3006
>  334*9
> #[1] 3006
>
> lst2New <- lapply(split(seq(3006),((seq(3006)-1)%%9)+1),function(x) {x1 <-
> res2[,x]; colnames(x1) <- paste(colnames(x1),1:334,sep="_");x1})
> sapply(lst2New,ncol)
> #  1   2   3   4   5   6   7   8   9
> #334 334 334 334 334 334 334 334 334
> lst2New[[1]][1:4,1:4]
> #     (Intercept)_1 (Intercept)_2 (Intercept)_3 (Intercept)_4
> #[1,]            NA    -0.3295765    -0.7025259            NA
> #[2,]            NA     0.1217360    -1.5221660            NA
> #[3,]            NA     0.3192466    -1.3586341            NA
> #[4,]            NA     0.2618476    -0.9698798            NA
>
>
> A.K.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Sunday, December 1, 2013 9:33 PM, nooldor <[email protected]> wrote:
>
> Hi,
>
> actually, do you remember yesterday code you build for me?
> it was:
>
> dat1<-read.csv("Book2.csv", header=T)
> >###same as previous
> >
> >
> >lst1 <- lapply(paste("r",1:334,sep="."),function(x)
> cbind(dat1[,c(1:3)],dat1[x]))
> >lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} )
> > sapply(lst2,function(x) sum(!!rowSums(is.na(x))))
> >library(zoo)
> >res1 <- do.call(rbind,lapply(lst2,
> >function(x) rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z);
> if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1);
> c(coef(l1),
> pval=summary(l1)$coef[,4], rsquare=summary(l1)$r.squared) } else
> rep(NA,9)},by.column=FALSE,align="right")))
> >row.names(res1) <- rep(paste("r",1:334,sep="."),each=123)
> > dim(res1)
> >#[1] 41082     9
>
> (boo2.xls is  attached in this e-mail previously, just need to open it in
> excel and save as csv)
> then we have result res1 as matrix 41082 x 9
> (this 41082 = 123observations in time x 334objects)
> now I need to separate this matrix res1 into 9 different matrices each one
> containing 123 obs x 334 variables
>
>
>
> maybe reshape() would be useful ...
>
> ... let me know if I described it clearly
>
>
>
>
>
> On 2 December 2013 03:17, arun <[email protected]> wrote:
>
> #or
> >
> >data.frame(split(dat1,dat1$A))
> >
> >
> >
> >
> >
> >On Sunday, December 1, 2013 9:16 PM, arun <[email protected]> wrote:
> >Hi,
> >Try:
> >dat1 <- read.table(text="A    B     C     D
> >r.1  x1    x2   x3
> >r.1  x4    x5    x6
> >r.2  x7    x8    x9
> >r.2  x10  x11 x12
> >r.3  x13  x14 x15
> >r.3  x16  x17 x18",header=TRUE,stringsAsFactors=FALSE)
> >
> > dat2 <- do.call(cbind,split(dat1,dat1$A))
> >colnames(dat2) <- gsub(".*\\.","",colnames(dat2))
> >A.K.
> >
> >
> >
> >
> >
> >
> >
> >On Sunday, December 1, 2013 6:32 PM, nooldor <[email protected]> wrote:
> >
> >Hi,
> >
> >could you also tell me how to reshape the res1 matrix like that:
> >
> >[now]
> >
> >A    B     C     D
> >r.1  x1    x2   x3
> >r.1  x4    x5    x6
> >r.2  x7    x8    x9r.2  x10  x11 x12r.3  x13  x14 x15r.3  x16  x17 x18
> >
> >
> >[after]:
> >A    B     C     D      A    B     C     D       A    B     C     D
> >r.1  x1    x2   x3      r.2  x7    x8    x9      r.3  x13  x14 x15
> >r.1  x4    x5   x6      r.2  x10  x11 x12      r.3  x16  x17 x18
> >
> >
> >
> >big thanks!
> >
> >
> >
> >
> >
> >
> >
> >On 30 November 2013 23:28, arun <[email protected]> wrote:
> >
> >Hi,
> >>No problem.
> >>
> >>In that case, each column will be a list.  For example if I take the
> first element of `lst2`
> >>dW1 <- rollapply(lst2[[1]],width=32,FUN=function(z) {z1 <-
> as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1
> <-lm(r~F.1+F.2+F.3,data=z1); durbinWatsonTest(l1,max.lag=3) } else
> rep(NA,4)},by.column=FALSE,align="right")
> >>
> >> tail(dW1[,1],1)
> >>#[[1]]
> >>#[1] -0.3602936  0.1975667 -0.1740797
> >>
> >>
> >>You can store it by:
> >>resdW1 <- do.call(cbind,lapply(seq_len(ncol(dW1)),function(i)
> do.call(rbind,dW1[,i]))[1:3])
> >>
> >>
> >>Similarly, for more than one elements (using a subset of lst2- as it
> takes time)
> >>
> >>
> >>lst3 <- lapply(lst2[1:2],function(x)
> rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z);
> if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1);
> durbinWatsonTest(l1,max.lag=3) } else
> rep(NA,4)},by.column=FALSE,align="right"))
> >>
> >>lst3New <- lapply(lst3,function(x)
> do.call(cbind,lapply(seq_len(ncol(x)),function(i)
> do.call(rbind,x[,i]))[1:3]))
> >>
> >>lst3New <- lapply(lst3New, function(x) {colnames(x) <-
> paste0(rep(c("r","dw","p"),each=3),1:3); x})
> >>
> >>A.K.
> >>
> >>
> >>On Saturday, November 30, 2013 5:03 PM, nooldor <[email protected]>
> wrote:
> >>
> >>Hey!
> >>
> >>
> >>Yes,
> >>only the D-W test takes so much time, did not check it yet
> >>
> >>I checked results (estimates) with manually run regressions (in excel)
> and they are correct.
> >>
> >>
> >>I only change the "width" to 31 and "each=123" to 124, cause it should
> be ((154-31)+1) x 334 = 41416 matrix
> >>
> >>
> >>with the lag in D-W test I was wondering how to have table when I use
> durbinWatsonTest(l1,3) - with three lags instead of default 1.
> >>
> >>but I can manage it - just need to learn about functions used by you.
> >>
> >>
> >>Any way: BIG THANK to you!
> >>
> >>
> >>Best wishes,
> >>T.S.
> >>
> >>
> >>
> >>
> >>
> >>On 30 November 2013 21:12, arun <[email protected]> wrote:
> >>
> >>Hi,
> >>>
> >>>I was able to read the file after saving it as .csv.  It seems to work
> without any errors.
> >>>
> >>>dat1<-read.csv("Book2.csv", header=T)
> >>>###same as previous
> >>>
> >>>
> >>>lst1 <- lapply(paste("r",1:334,sep="."),function(x)
> cbind(dat1[,c(1:3)],dat1[x]))
> >>>lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} )
> >>> sapply(lst2,function(x) sum(!!rowSums(is.na(x))))
> >>>library(zoo)
> >>>
> >>>res1 <- do.call(rbind,lapply(lst2,function(x)
> rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z);
> if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1);
> c(coef(l1), pval=summary(l1)$coef[,4], rsquare=summary(l1)$r.squared) }
> else rep(NA,9)},by.column=FALSE,align="right")))
> >>>row.names(res1) <- rep(paste("r",1:334,sep="."),each=123)
> >>> dim(res1)
> >>>#[1] 41082     9
> >>>
> >>>#vif
> >>> library(car)
> >>>
> >>>res2 <- do.call(rbind,lapply(lst2,function(x)
> rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z);
> if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); vif(l1) }
> else rep(NA,3)},by.column=FALSE,align="right")))
> >>>row.names(res2) <- rep(paste("r",1:334,sep="."),each=123)
> >>>dim(res2)
> >>>#[1] 41082     3
> >>>
> >>>#DW statistic:
> >>> lst3 <- lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z)
> {z1 <- as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1
> <-lm(r~F.1+F.2+F.3,data=z1); durbinWatsonTest(l1) } else
> rep(NA,4)},by.column=FALSE,align="right"))
> >>> res3 <- do.call(rbind,lapply(lst3,function(x) x[,-4]))
> >>>row.names(res3) <- rep(paste("r",1:334,sep="."),each=123)
> >>> dim(res3)
> >>>#[1] 41082     3
> >>>##ncvTest()
> >>>f4 <- function(meanmod, dta, varmod) {
> >>>assign(".dta", dta, envir=.GlobalEnv)
> >>>assign(".meanmod", meanmod, envir=.GlobalEnv)
> >>>m1 <- lm(.meanmod, .dta)
> >>>ans <- ncvTest(m1, varmod)
> >>>remove(".dta", envir=.GlobalEnv)
> >>>remove(".meanmod", envir=.GlobalEnv)
> >>>ans
> >>>}
> >>>
> >>> lst4 <- lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z)
> {z1 <- as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1 <-f4(r~.,z1)
> } else NA},by.column=FALSE,align="right"))
> >>>names(lst4) <- paste("r",1:334,sep=".")
> >>>length(lst4)
> >>>#[1] 334
> >>>
> >>>
> >>>###jarque.bera.test
> >>>library(tseries)
> >>>res5 <- do.call(rbind,lapply(lst2,function(x)
> rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z);
> if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); resid <-
> residuals(l1); unlist(jarque.bera.test(resid)[1:3]) } else
> rep(NA,3)},by.column=FALSE,align="right")))
> >>> dim(res5)
> >>>#[1] 41082     3
> >>>
> >>>A.K.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>On Saturday, November 30, 2013 1:44 PM, nooldor <[email protected]>
> wrote:
> >>>
> >>>here is in .xlsx should be easy to open and eventually find&replace
> commas according to you excel settings (or maybe it will do it
> automatically)
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>On 30 November 2013 19:15, arun <[email protected]> wrote:
> >>>
> >>>I tried that, but:
> >>>>
> >>>>
> >>>>
> >>>>dat1<-read.table("Book2.csv", head=T, sep=";", dec=",")
> >>>>> str(dat1)
> >>>>'data.frame':    154 obs. of  1 variable:
> >>>>
> >>>>Then I changed to:
> >>>>dat1<-read.table("Book2.csv", head=T, sep="\t", dec=",")
> >>>>> str(dat1)
> >>>>'data.frame':    154 obs. of  661 variables:
> >>>>Both of them are wrong as the number of variables should be 337.
> >>>>A.K.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>On Saturday, November 30, 2013 12:53 PM, nooldor <[email protected]>
> wrote:
> >>>>
> >>>>Thank you,
> >>>>
> >>>>I got your reply. I am just testing your script. I will let you know
> how is it soon.
> >>>>
> >>>>.csv could be problematic as commas are used as dec separator (Eastern
> Europe excel settings) ... I read it in R with this:
> >>>>dat1<-read.table("Book2.csv", head=T, sep=";", dec=",")
> >>>>
> >>>>Thank you very much !!!
> >>>>
> >>>>T.S.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>On 30 November 2013 18:39, arun <[email protected]> wrote:
> >>>>
> >>>>I couldn't read the "Book.csv" as the format is completely messed up.
> Anyway, I hope the solution works on your dataset.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>On Saturday, November 30, 2013 10:34 AM, nooldor <[email protected]>
> wrote:
> >>>>>
> >>>>>
> >>>>>ok.
> >>>>>
> >>>>>
> >>>>>> dat1<-read.table("Book2.csv", head=T, sep=";", dec=",") >
> colnames(dat1) <- c(paste("F",1:3,sep="."),paste("r",1:2,sep=".")) > lst1
> <- lapply(paste("r",1:2,sep="."),function(x) cbind(dat1[,c(1:3)],dat1[x]))
> > lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} ) >
> sum(!!rowSums(is.na(lst2[[1]]))) [1] 57 > #[1] 40 >
> sapply(lst2,function(x) sum(!!rowSums(is.na(x)))) [1] 57  0 > #[1] 40 46
> >>>>>in att you have the data file
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>On 30 November 2013 16:22, arun <[email protected]> wrote:
> >>>>>
> >>>>>Hi,
> >>>>>>The first point is not that clear.
> >>>>>>
> >>>>>>Could you show the expected results in this case?
> >>>>>>
> >>>>>>set.seed(432)
> >>>>>>dat1 <-
> as.data.frame(matrix(sample(c(1:10,NA),154*5,replace=TRUE),ncol=5))
> >>>>>> colnames(dat1) <- c(paste("F",1:3,sep="."),paste("r",1:2,sep="."))
> >>>>>>lst1 <- lapply(paste("r",1:2,sep="."),function(x)
> cbind(dat1[,c(1:3)],dat1[x]))
> >>>>>>
> >>>>>>
> >>>>>> lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} )
> >>>>>> sum(!!rowSums(is.na(lst2[[1]])))
> >>>>>>#[1] 40
> >>>>>> sapply(lst2,function(x) sum(!!rowSums(is.na(x))))
> >>>>>>#[1] 40 46
> >>>>>>
> >>>>>>
> >>>>>>A.K.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>On Saturday, November 30, 2013 10:09 AM, nooldor <[email protected]>
> wrote:
> >>>>>>
> >>>>>>Hi,
> >>>>>>
> >>>>>>Thanks for reply!
> >>>>>>
> >>>>>>
> >>>>>>Three things:
> >>>>>>1.
> >>>>>>I did not write that some of the data has more then 31 NA in the
> column and then it is not possible to run lm()
> >>>>>>
> >>>>>>Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok,
> ...) :  0 (non-NA) casesIn this case program should return "NA" symbol and
> go further, in the case when length of the observations is shorter then 31
> program should always return "NA" but go further .
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>2. in your result matrix there are only 4 columns (for estimates of
> the coefficients), is it possible to put there 4 more columns with p-values
> and one column with R squared
> >>>>>>
> >>>>>>
> >>>>>>3. basic statistical test for the regressions:
> >>>>>>
> >>>>>>inflation factors can be captured by:
> >>>>>>res2 <- do.call(rbind,lapply(lst2,function(x)
> rollapply(x,width=32,FUN=function(z)
> >>>>>>  vif(lm(r~
> F.1+F.2+F.3,data=as.data.frame(z))),by.column=FALSE,align="right")))
> >>>>>>
> >>>>>>and DW statistic:
> >>>>>>res3 <- do.call(rbind,lapply(lst2,function(x)
> rollapply(x,width=32,FUN=function(z)
> >>>>>>  durbinWatsonTest(lm(r~
> F.1+F.2+F.3,data=as.data.frame(z))),by.column=FALSE,align="right")))
> >>>>>>
> >>>>>>
> >>>>>>3a)is that right?
> >>>>>>
> >>>>>>3b) how to do and have in user-friendly form durbinWatsonTest for
> more then 1 lag?
> >>>>>>
> >>>>>>3c) how to apply: jarque.bera.test from library(tseries) and ncvTest
> from library(car) ???
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>Pozdrowienia,
> >>>>>>
> >>>>>>Tomasz Schabek
> >>>>>>
> >>>>>>
> >>>>>>On 30 November 2013 07:42, arun <[email protected]> wrote:
> >>>>>>
> >>>>>>Hi,
> >>>>>>>The link seems to be not working.  From the description, it looks
> like:
> >>>>>>>set.seed(432)
> >>>>>>>dat1 <-
> as.data.frame(matrix(sample(200,154*337,replace=TRUE),ncol=337))
> >>>>>>> colnames(dat1) <-
> c(paste("F",1:3,sep="."),paste("r",1:334,sep="."))
> >>>>>>>lst1 <- lapply(paste("r",1:334,sep="."),function(x)
> cbind(dat1[,c(1:3)],dat1[x]))
> >>>>>>>
> >>>>>>> lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} )
> >>>>>>>library(zoo)
> >>>>>>>
> >>>>>>>res <- do.call(rbind,lapply(lst2,function(x)
> rollapply(x,width=32,FUN=function(z) coef(lm(r~
> F.1+F.2+F.3,data=as.data.frame(z))),by.column=FALSE,align="right")))
> >>>>>>>
> >>>>>>>row.names(res) <- rep(paste("r",1:334,sep="."),each=123)
> >>>>>>> dim(res)
> >>>>>>>#[1] 41082     4
> >>>>>>>
> >>>>>>>coef(lm(r.1~F.1+F.2+F.3,data=dat1[1:32,]) )
> >>>>>>>#(Intercept)         F.1         F.2         F.3
> >>>>>>>#109.9168150  -0.1705361  -0.1028231   0.2027911
> >>>>>>>coef(lm(r.1~F.1+F.2+F.3,data=dat1[2:33,]) )
> >>>>>>>#(Intercept)         F.1         F.2         F.3
> >>>>>>>#119.3718949  -0.1660709  -0.2059830   0.1338608
> >>>>>>>res[1:2,]
> >>>>>>>#    (Intercept)        F.1        F.2       F.3
> >>>>>>>#r.1    109.9168 -0.1705361 -0.1028231 0.2027911
> >>>>>>>#r.1    119.3719 -0.1660709 -0.2059830 0.1338608
> >>>>>>>
> >>>>>>>A.K.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>On Friday, November 29, 2013 6:43 PM, nooldor <[email protected]>
> wrote:
> >>>>>>>Hi all!
> >>>>>>>
> >>>>>>>
> >>>>>>>I am just starting my adventure with R, so excuse me naive
> questions.
> >>>>>>>
> >>>>>>>My data look like that:
> >>>>>>>
> >>>>>>><http://r.789695.n4.nabble.com/file/n4681391/data_descr_img.jpg>
> >>>>>>>
> >>>>>>>I have 3 independent variables (F.1, F.2 and F.3) and 334 other
> variables
> >>>>>>>(r.1, r.2, ... r.334) - each one of these will be dependent
> variable in my
> >>>>>>>regression.
> >>>>>>>
> >>>>>>>Total span of the time is 154 observations. But I would like to
> have rolling
> >>>>>>>window regression with length of 31 observations.
> >>>>>>>
> >>>>>>>I would like to run script like that:
> >>>>>>>
> >>>>>>>summary(lm(r.1~F.1+F.2+F.3, data=data))
> >>>>>>>vif(lm(r.1~F.1+F.2+F.3, data=data))
> >>>>>>>
> >>>>>>>But for each of 334 (r.1 to r.334) dependent variables separately
> and with
> >>>>>>>rolling-window of the length 31obs.
> >>>>>>>
> >>>>>>>Id est:
> >>>>>>>summary(lm(r.1~F.1+F.2+F.3, data=data)) would be run 123 (154 total
> obs -
> >>>>>>>31. for the first regression) times for rolling-fixed period of 31
> obs.
> >>>>>>>
> >>>>>>>The next regression would be:
> >>>>>>>summary(lm(r.2~F.1+F.2+F.3, data=data)) also 123 times ... and so
> on till
> >>>>>>>summary(lm(r.334~F.1+F.2+F.3, data=data))
> >>>>>>>
> >>>>>>>It means it would be 123 x 334 regressions (=41082 regressions)
> >>>>>>>
> >>>>>>>I would like to save results (summary + vif test) of all those 41082
> >>>>>>>regressions in one read-user-friendly file like this given by e.g
> command
> >>>>>>>capture.output()
> >>>>>>>
> >>>>>>>Could you help with it?
> >>>>>>>
> >>>>>>>Regards,
> >>>>>>>
> >>>>>>>T.S.
> >>>>>>>
> >>>>>>>    [[alternative HTML version deleted]]
> >>>>>>>
> >>>>>>>______________________________________________
> >>>>>>>[email protected] mailing list
> >>>>>>>https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>>>PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >>>>>>>and provide commented, minimal, self-contained, reproducible code.
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>

        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Multiple regressions with changing dependent variable and time span

Reply via email to