Re: [R] Multiple regressions with changing dependent variable and time span

arun Sun, 01 Dec 2013 20:33:05 -0800

Hi,
Try:
dat1 <- read.table(text="A    B     C     D
r.1  x1    x2   x3
r.1  x4    x5    x6
r.2  x7    x8    x9
r.2  x10  x11 x12
r.3  x13  x14 x15
r.3  x16  x17 x18",header=TRUE,stringsAsFactors=FALSE)


 dat2 <- do.call(cbind,split(dat1,dat1$A))
colnames(dat2) <- gsub(".*\\.","",colnames(dat2))
A.K.






On Sunday, December 1, 2013 6:32 PM, nooldor <[email protected]> wrote:

Hi,

could you also tell me how to reshape the res1 matrix like that:

[now]

A    B     C     D
r.1  x1    x2   x3
r.1  x4    x5    x6
r.2  x7    x8    x9r.2  x10  x11 x12r.3  x13  x14 x15r.3  x16  x17 x18


[after]:
A    B     C     D      A    B     C     D       A    B     C     D
r.1  x1    x2   x3      r.2  x7    x8    x9      r.3  x13  x14 x15
r.1  x4    x5   x6      r.2  x10  x11 x12      r.3  x16  x17 x18



big thanks!







On 30 November 2013 23:28, arun <[email protected]> wrote:

Hi,
>No problem.
>
>In that case, each column will be a list.  For example if I take the first 
>element of `lst2`
>dW1 <- rollapply(lst2[[1]],width=32,FUN=function(z) {z1 <- as.data.frame(z); 
>if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); 
>durbinWatsonTest(l1,max.lag=3) } else rep(NA,4)},by.column=FALSE,align="right")
>
> tail(dW1[,1],1)
>#[[1]]
>#[1] -0.3602936  0.1975667 -0.1740797
>
>
>You can store it by:
>resdW1 <- do.call(cbind,lapply(seq_len(ncol(dW1)),function(i) 
>do.call(rbind,dW1[,i]))[1:3])
>
>
>Similarly, for more than one elements (using a subset of lst2- as it takes 
>time)
>
>
>lst3 <- lapply(lst2[1:2],function(x) rollapply(x,width=32,FUN=function(z) {z1 
><- as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1 
><-lm(r~F.1+F.2+F.3,data=z1); durbinWatsonTest(l1,max.lag=3) } else 
>rep(NA,4)},by.column=FALSE,align="right"))
>
>lst3New <- lapply(lst3,function(x) 
>do.call(cbind,lapply(seq_len(ncol(x)),function(i) do.call(rbind,x[,i]))[1:3]))
>
>lst3New <- lapply(lst3New, function(x) {colnames(x) <- 
>paste0(rep(c("r","dw","p"),each=3),1:3); x})
>
>A.K.
>
>
>On Saturday, November 30, 2013 5:03 PM, nooldor <[email protected]> wrote:
>
>Hey!
>
>
>Yes,
>only the D-W test takes so much time, did not check it yet
>
>I checked results (estimates) with manually run regressions (in excel) and 
>they are correct.
>
>
>I only change the "width" to 31 and "each=123" to 124, cause it should be 
>((154-31)+1) x 334 = 41416 matrix
>
>
>with the lag in D-W test I was wondering how to have table when I use 
>durbinWatsonTest(l1,3) - with three lags instead of default 1.
>
>but I can manage it - just need to learn about functions used by you.
>
>
>Any way: BIG THANK to you!
>
>
>Best wishes,
>T.S.
>
>
>
>
>
>On 30 November 2013 21:12, arun <[email protected]> wrote:
>
>Hi,
>>
>>I was able to read the file after saving it as .csv.  It seems to work 
>>without any errors.
>>
>>dat1<-read.csv("Book2.csv", header=T)
>>###same as previous
>>
>>
>>lst1 <- lapply(paste("r",1:334,sep="."),function(x) 
>>cbind(dat1[,c(1:3)],dat1[x]))
>>lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} )
>> sapply(lst2,function(x) sum(!!rowSums(is.na(x))))
>>library(zoo)
>>
>>res1 <- do.call(rbind,lapply(lst2,function(x) 
>>rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z); 
>>if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); c(coef(l1), 
>>pval=summary(l1)$coef[,4], rsquare=summary(l1)$r.squared) } else 
>>rep(NA,9)},by.column=FALSE,align="right")))
>>row.names(res1) <- rep(paste("r",1:334,sep="."),each=123)
>> dim(res1)
>>#[1] 41082     9
>>
>>#vif
>> library(car)
>>
>>res2 <- do.call(rbind,lapply(lst2,function(x) 
>>rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z); 
>>if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); vif(l1) } 
>>else rep(NA,3)},by.column=FALSE,align="right")))
>>row.names(res2) <- rep(paste("r",1:334,sep="."),each=123)
>>dim(res2)
>>#[1] 41082     3
>>
>>#DW statistic:
>> lst3 <- lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z) {z1 <- 
>>as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1 
>><-lm(r~F.1+F.2+F.3,data=z1); durbinWatsonTest(l1) } else 
>>rep(NA,4)},by.column=FALSE,align="right"))
>> res3 <- do.call(rbind,lapply(lst3,function(x) x[,-4]))
>>row.names(res3) <- rep(paste("r",1:334,sep="."),each=123)
>> dim(res3)
>>#[1] 41082     3
>>##ncvTest()
>>f4 <- function(meanmod, dta, varmod) {
>>assign(".dta", dta, envir=.GlobalEnv)
>>assign(".meanmod", meanmod, envir=.GlobalEnv)
>>m1 <- lm(.meanmod, .dta)
>>ans <- ncvTest(m1, varmod)
>>remove(".dta", envir=.GlobalEnv)
>>remove(".meanmod", envir=.GlobalEnv)
>>ans
>>}
>>
>> lst4 <- lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z) {z1 <- 
>>as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1 <-f4(r~.,z1) } else 
>>NA},by.column=FALSE,align="right"))
>>names(lst4) <- paste("r",1:334,sep=".")
>>length(lst4)
>>#[1] 334
>>
>>
>>###jarque.bera.test
>>library(tseries)
>>res5 <- do.call(rbind,lapply(lst2,function(x) 
>>rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z); 
>>if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); resid <- 
>>residuals(l1); unlist(jarque.bera.test(resid)[1:3]) } else 
>>rep(NA,3)},by.column=FALSE,align="right")))
>> dim(res5)
>>#[1] 41082     3
>>
>>A.K.
>>
>>
>>
>>
>>
>>
>>
>>
>>On Saturday, November 30, 2013 1:44 PM, nooldor <[email protected]> wrote:
>>
>>here is in .xlsx should be easy to open and eventually find&replace commas 
>>according to you excel settings (or maybe it will do it automatically)
>>
>>
>>
>>
>>
>>
>>On 30 November 2013 19:15, arun <[email protected]> wrote:
>>
>>I tried that, but:
>>>
>>>
>>>
>>>dat1<-read.table("Book2.csv", head=T, sep=";", dec=",")
>>>> str(dat1)
>>>'data.frame':    154 obs. of  1 variable:
>>>
>>>Then I changed to:
>>>dat1<-read.table("Book2.csv", head=T, sep="\t", dec=",")
>>>> str(dat1)
>>>'data.frame':    154 obs. of  661 variables:
>>>Both of them are wrong as the number of variables should be 337.
>>>A.K.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>On Saturday, November 30, 2013 12:53 PM, nooldor <[email protected]> wrote:
>>>
>>>Thank you,
>>>
>>>I got your reply. I am just testing your script. I will let you know how is 
>>>it soon.
>>>
>>>.csv could be problematic as commas are used as dec separator (Eastern 
>>>Europe excel settings) ... I read it in R with this:
>>>dat1<-read.table("Book2.csv", head=T, sep=";", dec=",")
>>>
>>>Thank you very much !!!
>>>
>>>T.S.
>>>
>>>
>>>
>>>
>>>On 30 November 2013 18:39, arun <[email protected]> wrote:
>>>
>>>I couldn't read the "Book.csv" as the format is completely messed up.  
>>>Anyway, I hope the solution works on your dataset.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>On Saturday, November 30, 2013 10:34 AM, nooldor <[email protected]> wrote:
>>>>
>>>>
>>>>ok.
>>>>
>>>>
>>>>> dat1<-read.table("Book2.csv", head=T, sep=";", dec=",") > colnames(dat1) 
>>>>> <- c(paste("F",1:3,sep="."),paste("r",1:2,sep=".")) > lst1 <- 
>>>>> lapply(paste("r",1:2,sep="."),function(x) cbind(dat1[,c(1:3)],dat1[x])) > 
>>>>> lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} ) > 
>>>>> sum(!!rowSums(is.na(lst2[[1]]))) [1] 57 > #[1] 40 > 
>>>>> sapply(lst2,function(x) sum(!!rowSums(is.na(x)))) [1] 57  0 > #[1] 40 46
>>>>in att you have the data file
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>On 30 November 2013 16:22, arun <[email protected]> wrote:
>>>>
>>>>Hi,
>>>>>The first point is not that clear.
>>>>>
>>>>>Could you show the expected results in this case?
>>>>>
>>>>>set.seed(432)
>>>>>dat1 <- as.data.frame(matrix(sample(c(1:10,NA),154*5,replace=TRUE),ncol=5))
>>>>> colnames(dat1) <- c(paste("F",1:3,sep="."),paste("r",1:2,sep="."))
>>>>>lst1 <- lapply(paste("r",1:2,sep="."),function(x) 
>>>>>cbind(dat1[,c(1:3)],dat1[x]))
>>>>>
>>>>>
>>>>> lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} )
>>>>> sum(!!rowSums(is.na(lst2[[1]])))
>>>>>#[1] 40
>>>>> sapply(lst2,function(x) sum(!!rowSums(is.na(x))))
>>>>>#[1] 40 46
>>>>>
>>>>>
>>>>>A.K.
>>>>>
>>>>>
>>>>>
>>>>>On Saturday, November 30, 2013 10:09 AM, nooldor <[email protected]> wrote:
>>>>>
>>>>>Hi,
>>>>>
>>>>>Thanks for reply!
>>>>>
>>>>>
>>>>>Three things:
>>>>>1.
>>>>>I did not write that some of the data has more then 31 NA in the column 
>>>>>and then it is not possible to run lm()
>>>>>
>>>>>Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :  
>>>>>0 (non-NA) casesIn this case program should return "NA" symbol and go 
>>>>>further, in the case when length of the observations is shorter then 31 
>>>>>program should always return "NA" but go further .
>>>>>
>>>>>
>>>>>
>>>>>2. in your result matrix there are only 4 columns (for estimates of the 
>>>>>coefficients), is it possible to put there 4 more columns with p-values 
>>>>>and one column with R squared
>>>>>
>>>>>
>>>>>3. basic statistical test for the regressions:
>>>>>
>>>>>inflation factors can be captured by:
>>>>>res2 <- do.call(rbind,lapply(lst2,function(x) 
>>>>>rollapply(x,width=32,FUN=function(z)
>>>>>  vif(lm(r~ 
>>>>>F.1+F.2+F.3,data=as.data.frame(z))),by.column=FALSE,align="right")))
>>>>>
>>>>>and DW statistic:
>>>>>res3 <- do.call(rbind,lapply(lst2,function(x) 
>>>>>rollapply(x,width=32,FUN=function(z)
>>>>>  durbinWatsonTest(lm(r~ 
>>>>>F.1+F.2+F.3,data=as.data.frame(z))),by.column=FALSE,align="right")))
>>>>>
>>>>>
>>>>>3a)is that right?
>>>>>
>>>>>3b) how to do and have in user-friendly form durbinWatsonTest for more 
>>>>>then 1 lag?
>>>>>
>>>>>3c) how to apply: jarque.bera.test from library(tseries) and ncvTest from 
>>>>>library(car) ???
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>Pozdrowienia,
>>>>>
>>>>>Tomasz Schabek
>>>>>
>>>>>
>>>>>On 30 November 2013 07:42, arun <[email protected]> wrote:
>>>>>
>>>>>Hi,
>>>>>>The link seems to be not working.  From the description, it looks like:
>>>>>>set.seed(432)
>>>>>>dat1 <- as.data.frame(matrix(sample(200,154*337,replace=TRUE),ncol=337))
>>>>>> colnames(dat1) <- c(paste("F",1:3,sep="."),paste("r",1:334,sep="."))
>>>>>>lst1 <- lapply(paste("r",1:334,sep="."),function(x) 
>>>>>>cbind(dat1[,c(1:3)],dat1[x]))
>>>>>>
>>>>>> lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} )
>>>>>>library(zoo)
>>>>>>
>>>>>>res <- do.call(rbind,lapply(lst2,function(x) 
>>>>>>rollapply(x,width=32,FUN=function(z) coef(lm(r~ 
>>>>>>F.1+F.2+F.3,data=as.data.frame(z))),by.column=FALSE,align="right")))
>>>>>>
>>>>>>row.names(res) <- rep(paste("r",1:334,sep="."),each=123)
>>>>>> dim(res)
>>>>>>#[1] 41082     4
>>>>>>
>>>>>>coef(lm(r.1~F.1+F.2+F.3,data=dat1[1:32,]) )
>>>>>>#(Intercept)         F.1         F.2         F.3
>>>>>>#109.9168150  -0.1705361  -0.1028231   0.2027911
>>>>>>coef(lm(r.1~F.1+F.2+F.3,data=dat1[2:33,]) )
>>>>>>#(Intercept)         F.1         F.2         F.3
>>>>>>#119.3718949  -0.1660709  -0.2059830   0.1338608
>>>>>>res[1:2,]
>>>>>>#    (Intercept)        F.1        F.2       F.3
>>>>>>#r.1    109.9168 -0.1705361 -0.1028231 0.2027911
>>>>>>#r.1    119.3719 -0.1660709 -0.2059830 0.1338608
>>>>>>
>>>>>>A.K.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>On Friday, November 29, 2013 6:43 PM, nooldor <[email protected]> wrote:
>>>>>>Hi all!
>>>>>>
>>>>>>
>>>>>>I am just starting my adventure with R, so excuse me naive questions.
>>>>>>
>>>>>>My data look like that:
>>>>>>
>>>>>><http://r.789695.n4.nabble.com/file/n4681391/data_descr_img.jpg>
>>>>>>
>>>>>>I have 3 independent variables (F.1, F.2 and F.3) and 334 other variables
>>>>>>(r.1, r.2, ... r.334) - each one of these will be dependent variable in my
>>>>>>regression.
>>>>>>
>>>>>>Total span of the time is 154 observations. But I would like to have 
>>>>>>rolling
>>>>>>window regression with length of 31 observations.
>>>>>>
>>>>>>I would like to run script like that:
>>>>>>
>>>>>>summary(lm(r.1~F.1+F.2+F.3, data=data))
>>>>>>vif(lm(r.1~F.1+F.2+F.3, data=data))
>>>>>>
>>>>>>But for each of 334 (r.1 to r.334) dependent variables separately and with
>>>>>>rolling-window of the length 31obs.
>>>>>>
>>>>>>Id est:
>>>>>>summary(lm(r.1~F.1+F.2+F.3, data=data)) would be run 123 (154 total obs -
>>>>>>31. for the first regression) times for rolling-fixed period of 31 obs.
>>>>>>
>>>>>>The next regression would be:
>>>>>>summary(lm(r.2~F.1+F.2+F.3, data=data)) also 123 times ... and so on till
>>>>>>summary(lm(r.334~F.1+F.2+F.3, data=data))
>>>>>>
>>>>>>It means it would be 123 x 334 regressions (=41082 regressions)
>>>>>>
>>>>>>I would like to save results (summary + vif test) of all those 41082
>>>>>>regressions in one read-user-friendly file like this given by e.g command
>>>>>>capture.output()
>>>>>>
>>>>>>Could you help with it?
>>>>>>
>>>>>>Regards,
>>>>>>
>>>>>>T.S.
>>>>>>
>>>>>>    [[alternative HTML version deleted]]
>>>>>>
>>>>>>______________________________________________
>>>>>>[email protected] mailing list
>>>>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>PLEASE do read the posting guide 
>>>>>>http://www.R-project.org/posting-guide.html
>>>>>>and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Multiple regressions with changing dependent variable and time span

Reply via email to