Re: [R] Multiple regressions with changing dependent variable and time span

arun Sun, 01 Dec 2013 20:34:44 -0800

Hi,
I guess you wanted something like this:

res2 <- do.call(cbind,lapply(lst2,
 function(x)
rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z); 
if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); 
c(coef(l1), pval=summary(l1)$coef[,4], rsquare=summary(l1)$r.squared) } 
else rep(NA,9)},by.column=FALSE,align="right")))
dim(res2)
#[1]  123 3006
 334*9
#[1] 3006


lst2New <- lapply(split(seq(3006),((seq(3006)-1)%%9)+1),function(x) {x1 <- 
res2[,x]; colnames(x1) <- paste(colnames(x1),1:334,sep="_");x1})
sapply(lst2New,ncol)
#  1   2   3   4   5   6   7   8   9 
#334 334 334 334 334 334 334 334 334 
lst2New[[1]][1:4,1:4]
#     (Intercept)_1 (Intercept)_2 (Intercept)_3 (Intercept)_4
#[1,]            NA    -0.3295765    -0.7025259            NA
#[2,]            NA     0.1217360    -1.5221660            NA
#[3,]            NA     0.3192466    -1.3586341            NA
#[4,]            NA     0.2618476    -0.9698798            NA


A.K.














On Sunday, December 1, 2013 9:33 PM, nooldor <[email protected]> wrote:

Hi,

actually, do you remember yesterday code you build for me?
it was:

dat1<-read.csv("Book2.csv", header=T)
>###same as previous
>
>
>lst1 <- lapply(paste("r",1:334,sep="."),function(x) 
>cbind(dat1[,c(1:3)],dat1[x]))
>lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} )
> sapply(lst2,function(x) sum(!!rowSums(is.na(x))))
>library(zoo)
>res1 <- do.call(rbind,lapply(lst2,
>function(x) rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z); 
>if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); c(coef(l1), 
pval=summary(l1)$coef[,4], rsquare=summary(l1)$r.squared) } else 
rep(NA,9)},by.column=FALSE,align="right")))
>row.names(res1) <- rep(paste("r",1:334,sep="."),each=123)
> dim(res1)
>#[1] 41082     9

(boo2.xls is  attached in this e-mail previously, just need to open it in excel 
and save as csv)
then we have result res1 as matrix 41082 x 9
(this 41082 = 123observations in time x 334objects)
now I need to separate this matrix res1 into 9 different matrices each one 
containing 123 obs x 334 variables 



maybe reshape() would be useful ...

... let me know if I described it clearly 





On 2 December 2013 03:17, arun <[email protected]> wrote:

#or
>
>data.frame(split(dat1,dat1$A))
>
>
>
>
>
>On Sunday, December 1, 2013 9:16 PM, arun <[email protected]> wrote:
>Hi,
>Try:
>dat1 <- read.table(text="A    B     C     D
>r.1  x1    x2   x3
>r.1  x4    x5    x6
>r.2  x7    x8    x9
>r.2  x10  x11 x12
>r.3  x13  x14 x15
>r.3  x16  x17 x18",header=TRUE,stringsAsFactors=FALSE)
>
> dat2 <- do.call(cbind,split(dat1,dat1$A))
>colnames(dat2) <- gsub(".*\\.","",colnames(dat2))
>A.K.
>
>
>
>
>
>
>
>On Sunday, December 1, 2013 6:32 PM, nooldor <[email protected]> wrote:
>
>Hi,
>
>could you also tell me how to reshape the res1 matrix like that:
>
>[now]
>
>A    B     C     D
>r.1  x1    x2   x3
>r.1  x4    x5    x6
>r.2  x7    x8    x9r.2  x10  x11 x12r.3  x13  x14 x15r.3  x16  x17 x18
>
>
>[after]:
>A    B     C     D      A    B     C     D       A    B     C     D
>r.1  x1    x2   x3      r.2  x7    x8    x9      r.3  x13  x14 x15
>r.1  x4    x5   x6      r.2  x10  x11 x12      r.3  x16  x17 x18
>
>
>
>big thanks!
>
>
>
>
>
>
>
>On 30 November 2013 23:28, arun <[email protected]> wrote:
>
>Hi,
>>No problem.
>>
>>In that case, each column will be a list.  For example if I take the first 
>>element of `lst2`
>>dW1 <- rollapply(lst2[[1]],width=32,FUN=function(z) {z1 <- as.data.frame(z); 
>>if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); 
>>durbinWatsonTest(l1,max.lag=3) } else 
>>rep(NA,4)},by.column=FALSE,align="right")
>>
>> tail(dW1[,1],1)
>>#[[1]]
>>#[1] -0.3602936  0.1975667 -0.1740797
>>
>>
>>You can store it by:
>>resdW1 <- do.call(cbind,lapply(seq_len(ncol(dW1)),function(i) 
>>do.call(rbind,dW1[,i]))[1:3])
>>
>>
>>Similarly, for more than one elements (using a subset of lst2- as it takes 
>>time)
>>
>>
>>lst3 <- lapply(lst2[1:2],function(x) rollapply(x,width=32,FUN=function(z) {z1 
>><- as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1 
>><-lm(r~F.1+F.2+F.3,data=z1); durbinWatsonTest(l1,max.lag=3) } else 
>>rep(NA,4)},by.column=FALSE,align="right"))
>>
>>lst3New <- lapply(lst3,function(x) 
>>do.call(cbind,lapply(seq_len(ncol(x)),function(i) do.call(rbind,x[,i]))[1:3]))
>>
>>lst3New <- lapply(lst3New, function(x) {colnames(x) <- 
>>paste0(rep(c("r","dw","p"),each=3),1:3); x})
>>
>>A.K.
>>
>>
>>On Saturday, November 30, 2013 5:03 PM, nooldor <[email protected]> wrote:
>>
>>Hey!
>>
>>
>>Yes,
>>only the D-W test takes so much time, did not check it yet
>>
>>I checked results (estimates) with manually run regressions (in excel) and 
>>they are correct.
>>
>>
>>I only change the "width" to 31 and "each=123" to 124, cause it should be 
>>((154-31)+1) x 334 = 41416 matrix
>>
>>
>>with the lag in D-W test I was wondering how to have table when I use 
>>durbinWatsonTest(l1,3) - with three lags instead of default 1.
>>
>>but I can manage it - just need to learn about functions used by you.
>>
>>
>>Any way: BIG THANK to you!
>>
>>
>>Best wishes,
>>T.S.
>>
>>
>>
>>
>>
>>On 30 November 2013 21:12, arun <[email protected]> wrote:
>>
>>Hi,
>>>
>>>I was able to read the file after saving it as .csv.  It seems to work 
>>>without any errors.
>>>
>>>dat1<-read.csv("Book2.csv", header=T)
>>>###same as previous
>>>
>>>
>>>lst1 <- lapply(paste("r",1:334,sep="."),function(x) 
>>>cbind(dat1[,c(1:3)],dat1[x]))
>>>lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} )
>>> sapply(lst2,function(x) sum(!!rowSums(is.na(x))))
>>>library(zoo)
>>>
>>>res1 <- do.call(rbind,lapply(lst2,function(x) 
>>>rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z); 
>>>if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); c(coef(l1), 
>>>pval=summary(l1)$coef[,4], rsquare=summary(l1)$r.squared) } else 
>>>rep(NA,9)},by.column=FALSE,align="right")))
>>>row.names(res1) <- rep(paste("r",1:334,sep="."),each=123)
>>> dim(res1)
>>>#[1] 41082     9
>>>
>>>#vif
>>> library(car)
>>>
>>>res2 <- do.call(rbind,lapply(lst2,function(x) 
>>>rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z); 
>>>if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); vif(l1) } 
>>>else rep(NA,3)},by.column=FALSE,align="right")))
>>>row.names(res2) <- rep(paste("r",1:334,sep="."),each=123)
>>>dim(res2)
>>>#[1] 41082     3
>>>
>>>#DW statistic:
>>> lst3 <- lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z) {z1 <- 
>>>as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1 
>>><-lm(r~F.1+F.2+F.3,data=z1); durbinWatsonTest(l1) } else 
>>>rep(NA,4)},by.column=FALSE,align="right"))
>>> res3 <- do.call(rbind,lapply(lst3,function(x) x[,-4]))
>>>row.names(res3) <- rep(paste("r",1:334,sep="."),each=123)
>>> dim(res3)
>>>#[1] 41082     3
>>>##ncvTest()
>>>f4 <- function(meanmod, dta, varmod) {
>>>assign(".dta", dta, envir=.GlobalEnv)
>>>assign(".meanmod", meanmod, envir=.GlobalEnv)
>>>m1 <- lm(.meanmod, .dta)
>>>ans <- ncvTest(m1, varmod)
>>>remove(".dta", envir=.GlobalEnv)
>>>remove(".meanmod", envir=.GlobalEnv)
>>>ans
>>>}
>>>
>>> lst4 <- lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z) {z1 <- 
>>>as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1 <-f4(r~.,z1) } else 
>>>NA},by.column=FALSE,align="right"))
>>>names(lst4) <- paste("r",1:334,sep=".")
>>>length(lst4)
>>>#[1] 334
>>>
>>>
>>>###jarque.bera.test
>>>library(tseries)
>>>res5 <- do.call(rbind,lapply(lst2,function(x) 
>>>rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z); 
>>>if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); resid <- 
>>>residuals(l1); unlist(jarque.bera.test(resid)[1:3]) } else 
>>>rep(NA,3)},by.column=FALSE,align="right")))
>>> dim(res5)
>>>#[1] 41082     3
>>>
>>>A.K.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>On Saturday, November 30, 2013 1:44 PM, nooldor <[email protected]> wrote:
>>>
>>>here is in .xlsx should be easy to open and eventually find&replace commas 
>>>according to you excel settings (or maybe it will do it automatically)
>>>
>>>
>>>
>>>
>>>
>>>
>>>On 30 November 2013 19:15, arun <[email protected]> wrote:
>>>
>>>I tried that, but:
>>>>
>>>>
>>>>
>>>>dat1<-read.table("Book2.csv", head=T, sep=";", dec=",")
>>>>> str(dat1)
>>>>'data.frame':    154 obs. of  1 variable:
>>>>
>>>>Then I changed to:
>>>>dat1<-read.table("Book2.csv", head=T, sep="\t", dec=",")
>>>>> str(dat1)
>>>>'data.frame':    154 obs. of  661 variables:
>>>>Both of them are wrong as the number of variables should be 337.
>>>>A.K.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>On Saturday, November 30, 2013 12:53 PM, nooldor <[email protected]> wrote:
>>>>
>>>>Thank you,
>>>>
>>>>I got your reply. I am just testing your script. I will let you know how is 
>>>>it soon.
>>>>
>>>>.csv could be problematic as commas are used as dec separator (Eastern 
>>>>Europe excel settings) ... I read it in R with this:
>>>>dat1<-read.table("Book2.csv", head=T, sep=";", dec=",")
>>>>
>>>>Thank you very much !!!
>>>>
>>>>T.S.
>>>>
>>>>
>>>>
>>>>
>>>>On 30 November 2013 18:39, arun <[email protected]> wrote:
>>>>
>>>>I couldn't read the "Book.csv" as the format is completely messed up.  
>>>>Anyway, I hope the solution works on your dataset.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>On Saturday, November 30, 2013 10:34 AM, nooldor <[email protected]> wrote:
>>>>>
>>>>>
>>>>>ok.
>>>>>
>>>>>
>>>>>> dat1<-read.table("Book2.csv", head=T, sep=";", dec=",") > colnames(dat1) 
>>>>>> <- c(paste("F",1:3,sep="."),paste("r",1:2,sep=".")) > lst1 <- 
>>>>>> lapply(paste("r",1:2,sep="."),function(x) cbind(dat1[,c(1:3)],dat1[x])) 
>>>>>> > lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} ) > 
>>>>>> sum(!!rowSums(is.na(lst2[[1]]))) [1] 57 > #[1] 40 > 
>>>>>> sapply(lst2,function(x) sum(!!rowSums(is.na(x)))) [1] 57  0 > #[1] 40 46
>>>>>in att you have the data file
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>On 30 November 2013 16:22, arun <[email protected]> wrote:
>>>>>
>>>>>Hi,
>>>>>>The first point is not that clear.
>>>>>>
>>>>>>Could you show the expected results in this case?
>>>>>>
>>>>>>set.seed(432)
>>>>>>dat1 <- 
>>>>>>as.data.frame(matrix(sample(c(1:10,NA),154*5,replace=TRUE),ncol=5))
>>>>>> colnames(dat1) <- c(paste("F",1:3,sep="."),paste("r",1:2,sep="."))
>>>>>>lst1 <- lapply(paste("r",1:2,sep="."),function(x) 
>>>>>>cbind(dat1[,c(1:3)],dat1[x]))
>>>>>>
>>>>>>
>>>>>> lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} )
>>>>>> sum(!!rowSums(is.na(lst2[[1]])))
>>>>>>#[1] 40
>>>>>> sapply(lst2,function(x) sum(!!rowSums(is.na(x))))
>>>>>>#[1] 40 46
>>>>>>
>>>>>>
>>>>>>A.K.
>>>>>>
>>>>>>
>>>>>>
>>>>>>On Saturday, November 30, 2013 10:09 AM, nooldor <[email protected]> 
>>>>>>wrote:
>>>>>>
>>>>>>Hi,
>>>>>>
>>>>>>Thanks for reply!
>>>>>>
>>>>>>
>>>>>>Three things:
>>>>>>1.
>>>>>>I did not write that some of the data has more then 31 NA in the column 
>>>>>>and then it is not possible to run lm()
>>>>>>
>>>>>>Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :  
>>>>>>0 (non-NA) casesIn this case program should return "NA" symbol and go 
>>>>>>further, in the case when length of the observations is shorter then 31 
>>>>>>program should always return "NA" but go further .
>>>>>>
>>>>>>
>>>>>>
>>>>>>2. in your result matrix there are only 4 columns (for estimates of the 
>>>>>>coefficients), is it possible to put there 4 more columns with p-values 
>>>>>>and one column with R squared
>>>>>>
>>>>>>
>>>>>>3. basic statistical test for the regressions:
>>>>>>
>>>>>>inflation factors can be captured by:
>>>>>>res2 <- do.call(rbind,lapply(lst2,function(x) 
>>>>>>rollapply(x,width=32,FUN=function(z)
>>>>>>  vif(lm(r~ 
>>>>>>F.1+F.2+F.3,data=as.data.frame(z))),by.column=FALSE,align="right")))
>>>>>>
>>>>>>and DW statistic:
>>>>>>res3 <- do.call(rbind,lapply(lst2,function(x) 
>>>>>>rollapply(x,width=32,FUN=function(z)
>>>>>>  durbinWatsonTest(lm(r~ 
>>>>>>F.1+F.2+F.3,data=as.data.frame(z))),by.column=FALSE,align="right")))
>>>>>>
>>>>>>
>>>>>>3a)is that right?
>>>>>>
>>>>>>3b) how to do and have in user-friendly form durbinWatsonTest for more 
>>>>>>then 1 lag?
>>>>>>
>>>>>>3c) how to apply: jarque.bera.test from library(tseries) and ncvTest from 
>>>>>>library(car) ???
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>Pozdrowienia,
>>>>>>
>>>>>>Tomasz Schabek
>>>>>>
>>>>>>
>>>>>>On 30 November 2013 07:42, arun <[email protected]> wrote:
>>>>>>
>>>>>>Hi,
>>>>>>>The link seems to be not working.  From the description, it looks like:
>>>>>>>set.seed(432)
>>>>>>>dat1 <- as.data.frame(matrix(sample(200,154*337,replace=TRUE),ncol=337))
>>>>>>> colnames(dat1) <- c(paste("F",1:3,sep="."),paste("r",1:334,sep="."))
>>>>>>>lst1 <- lapply(paste("r",1:334,sep="."),function(x) 
>>>>>>>cbind(dat1[,c(1:3)],dat1[x]))
>>>>>>>
>>>>>>> lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} )
>>>>>>>library(zoo)
>>>>>>>
>>>>>>>res <- do.call(rbind,lapply(lst2,function(x) 
>>>>>>>rollapply(x,width=32,FUN=function(z) coef(lm(r~ 
>>>>>>>F.1+F.2+F.3,data=as.data.frame(z))),by.column=FALSE,align="right")))
>>>>>>>
>>>>>>>row.names(res) <- rep(paste("r",1:334,sep="."),each=123)
>>>>>>> dim(res)
>>>>>>>#[1] 41082     4
>>>>>>>
>>>>>>>coef(lm(r.1~F.1+F.2+F.3,data=dat1[1:32,]) )
>>>>>>>#(Intercept)         F.1         F.2         F.3
>>>>>>>#109.9168150  -0.1705361  -0.1028231   0.2027911
>>>>>>>coef(lm(r.1~F.1+F.2+F.3,data=dat1[2:33,]) )
>>>>>>>#(Intercept)         F.1         F.2         F.3
>>>>>>>#119.3718949  -0.1660709  -0.2059830   0.1338608
>>>>>>>res[1:2,]
>>>>>>>#    (Intercept)        F.1        F.2       F.3
>>>>>>>#r.1    109.9168 -0.1705361 -0.1028231 0.2027911
>>>>>>>#r.1    119.3719 -0.1660709 -0.2059830 0.1338608
>>>>>>>
>>>>>>>A.K.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>On Friday, November 29, 2013 6:43 PM, nooldor <[email protected]> wrote:
>>>>>>>Hi all!
>>>>>>>
>>>>>>>
>>>>>>>I am just starting my adventure with R, so excuse me naive questions.
>>>>>>>
>>>>>>>My data look like that:
>>>>>>>
>>>>>>><http://r.789695.n4.nabble.com/file/n4681391/data_descr_img.jpg>
>>>>>>>
>>>>>>>I have 3 independent variables (F.1, F.2 and F.3) and 334 other variables
>>>>>>>(r.1, r.2, ... r.334) - each one of these will be dependent variable in 
>>>>>>>my
>>>>>>>regression.
>>>>>>>
>>>>>>>Total span of the time is 154 observations. But I would like to have 
>>>>>>>rolling
>>>>>>>window regression with length of 31 observations.
>>>>>>>
>>>>>>>I would like to run script like that:
>>>>>>>
>>>>>>>summary(lm(r.1~F.1+F.2+F.3, data=data))
>>>>>>>vif(lm(r.1~F.1+F.2+F.3, data=data))
>>>>>>>
>>>>>>>But for each of 334 (r.1 to r.334) dependent variables separately and 
>>>>>>>with
>>>>>>>rolling-window of the length 31obs.
>>>>>>>
>>>>>>>Id est:
>>>>>>>summary(lm(r.1~F.1+F.2+F.3, data=data)) would be run 123 (154 total obs -
>>>>>>>31. for the first regression) times for rolling-fixed period of 31 obs.
>>>>>>>
>>>>>>>The next regression would be:
>>>>>>>summary(lm(r.2~F.1+F.2+F.3, data=data)) also 123 times ... and so on till
>>>>>>>summary(lm(r.334~F.1+F.2+F.3, data=data))
>>>>>>>
>>>>>>>It means it would be 123 x 334 regressions (=41082 regressions)
>>>>>>>
>>>>>>>I would like to save results (summary + vif test) of all those 41082
>>>>>>>regressions in one read-user-friendly file like this given by e.g command
>>>>>>>capture.output()
>>>>>>>
>>>>>>>Could you help with it?
>>>>>>>
>>>>>>>Regards,
>>>>>>>
>>>>>>>T.S.
>>>>>>>
>>>>>>>    [[alternative HTML version deleted]]
>>>>>>>
>>>>>>>______________________________________________
>>>>>>>[email protected] mailing list
>>>>>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>PLEASE do read the posting guide 
>>>>>>>http://www.R-project.org/posting-guide.html
>>>>>>>and provide commented, minimal, self-contained, reproducible code.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Multiple regressions with changing dependent variable and time span

Reply via email to