Re: [R] Multiple regressions with changing dependent variable and time span

arun Sat, 30 Nov 2013 13:06:37 -0800

Hi,

I was able to read the file after saving it as .csv.  It seems to work without 
any errors.


dat1<-read.csv("Book2.csv", header=T)
###same as previous

lst1 <- lapply(paste("r",1:334,sep="."),function(x) 
cbind(dat1[,c(1:3)],dat1[x]))
lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} )
 sapply(lst2,function(x) sum(!!rowSums(is.na(x))))
library(zoo)
res1 <- do.call(rbind,lapply(lst2,function(x) 
rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z); 
if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); c(coef(l1), 
pval=summary(l1)$coef[,4], rsquare=summary(l1)$r.squared) } else 
rep(NA,9)},by.column=FALSE,align="right")))
row.names(res1) <- rep(paste("r",1:334,sep="."),each=123)
 dim(res1)
#[1] 41082     9

#vif
 library(car)
res2 <- do.call(rbind,lapply(lst2,function(x) 
rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z); 
if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); vif(l1) } else 
rep(NA,3)},by.column=FALSE,align="right")))
row.names(res2) <- rep(paste("r",1:334,sep="."),each=123)
dim(res2)
#[1] 41082     3

#DW statistic:
 lst3 <- lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z) {z1 <- 
as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1 
<-lm(r~F.1+F.2+F.3,data=z1); durbinWatsonTest(l1) } else 
rep(NA,4)},by.column=FALSE,align="right"))
 res3 <- do.call(rbind,lapply(lst3,function(x) x[,-4]))
row.names(res3) <- rep(paste("r",1:334,sep="."),each=123)
 dim(res3)
#[1] 41082     3
##ncvTest()
f4 <- function(meanmod, dta, varmod) {
assign(".dta", dta, envir=.GlobalEnv)
assign(".meanmod", meanmod, envir=.GlobalEnv)
m1 <- lm(.meanmod, .dta)
ans <- ncvTest(m1, varmod)
remove(".dta", envir=.GlobalEnv)
remove(".meanmod", envir=.GlobalEnv)
ans
}

 lst4 <- lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z) {z1 <- 
as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1 <-f4(r~.,z1) } else 
NA},by.column=FALSE,align="right"))
names(lst4) <- paste("r",1:334,sep=".") 
length(lst4)
#[1] 334


###jarque.bera.test
library(tseries)
res5 <- do.call(rbind,lapply(lst2,function(x) 
rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z); 
if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); resid <- 
residuals(l1); unlist(jarque.bera.test(resid)[1:3]) } else 
rep(NA,3)},by.column=FALSE,align="right")))
 dim(res5)
#[1] 41082     3

A.K.







On Saturday, November 30, 2013 1:44 PM, nooldor <nool...@gmail.com> wrote:

here is in .xlsx should be easy to open and eventually find&replace commas 
according to you excel settings (or maybe it will do it automatically)






On 30 November 2013 19:15, arun <smartpink...@yahoo.com> wrote:

I tried that, but:
>
>
>
>dat1<-read.table("Book2.csv", head=T, sep=";", dec=",")
>> str(dat1)
>'data.frame':    154 obs. of  1 variable:
>
>Then I changed to:
>dat1<-read.table("Book2.csv", head=T, sep="\t", dec=",")
>> str(dat1)
>'data.frame':    154 obs. of  661 variables:
>Both of them are wrong as the number of variables should be 337.
>A.K.
>
>
>
>
>
>
>
>On Saturday, November 30, 2013 12:53 PM, nooldor <nool...@gmail.com> wrote:
>
>Thank you,
>
>I got your reply. I am just testing your script. I will let you know how is it 
>soon.
>
>.csv could be problematic as commas are used as dec separator (Eastern Europe 
>excel settings) ... I read it in R with this:
>dat1<-read.table("Book2.csv", head=T, sep=";", dec=",")
>
>Thank you very much !!!
>
>T.S.
>
>
>
>
>On 30 November 2013 18:39, arun <smartpink...@yahoo.com> wrote:
>
>I couldn't read the "Book.csv" as the format is completely messed up.  Anyway, 
>I hope the solution works on your dataset.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>On Saturday, November 30, 2013 10:34 AM, nooldor <nool...@gmail.com> wrote:
>>
>>
>>ok.
>>
>>
>>> dat1<-read.table("Book2.csv", head=T, sep=";", dec=",") > colnames(dat1) <- 
>>> c(paste("F",1:3,sep="."),paste("r",1:2,sep=".")) > lst1 <- 
>>> lapply(paste("r",1:2,sep="."),function(x) cbind(dat1[,c(1:3)],dat1[x])) > 
>>> lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} ) > 
>>> sum(!!rowSums(is.na(lst2[[1]]))) [1] 57 > #[1] 40 > sapply(lst2,function(x) 
>>> sum(!!rowSums(is.na(x)))) [1] 57  0 > #[1] 40 46
>>in att you have the data file
>>
>>
>>
>>
>>
>>
>>On 30 November 2013 16:22, arun <smartpink...@yahoo.com> wrote:
>>
>>Hi,
>>>The first point is not that clear.
>>>
>>>Could you show the expected results in this case?
>>>
>>>set.seed(432)
>>>dat1 <- as.data.frame(matrix(sample(c(1:10,NA),154*5,replace=TRUE),ncol=5))
>>> colnames(dat1) <- c(paste("F",1:3,sep="."),paste("r",1:2,sep="."))
>>>lst1 <- lapply(paste("r",1:2,sep="."),function(x) 
>>>cbind(dat1[,c(1:3)],dat1[x]))
>>>
>>>
>>> lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} )
>>> sum(!!rowSums(is.na(lst2[[1]])))
>>>#[1] 40
>>> sapply(lst2,function(x) sum(!!rowSums(is.na(x))))
>>>#[1] 40 46
>>>
>>>
>>>A.K.
>>>
>>>
>>>
>>>On Saturday, November 30, 2013 10:09 AM, nooldor <nool...@gmail.com> wrote:
>>>
>>>Hi,
>>>
>>>Thanks for reply!
>>>
>>>
>>>Three things:
>>>1.
>>>I did not write that some of the data has more then 31 NA in the column and 
>>>then it is not possible to run lm()
>>>
>>>Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :  0 
>>>(non-NA) casesIn this case program should return "NA" symbol and go further, 
>>>in the case when length of the observations is shorter then 31 program 
>>>should always return "NA" but go further .
>>>
>>>
>>>
>>>2. in your result matrix there are only 4 columns (for estimates of the 
>>>coefficients), is it possible to put there 4 more columns with p-values and 
>>>one column with R squared
>>>
>>>
>>>3. basic statistical test for the regressions:
>>>
>>>inflation factors can be captured by:
>>>res2 <- do.call(rbind,lapply(lst2,function(x) 
>>>rollapply(x,width=32,FUN=function(z)
>>>  vif(lm(r~ 
>>>F.1+F.2+F.3,data=as.data.frame(z))),by.column=FALSE,align="right")))
>>>
>>>and DW statistic:
>>>res3 <- do.call(rbind,lapply(lst2,function(x) 
>>>rollapply(x,width=32,FUN=function(z)
>>>  durbinWatsonTest(lm(r~ 
>>>F.1+F.2+F.3,data=as.data.frame(z))),by.column=FALSE,align="right")))
>>>
>>>
>>>3a)is that right?
>>>
>>>3b) how to do and have in user-friendly form durbinWatsonTest for more then 
>>>1 lag?
>>>
>>>3c) how to apply: jarque.bera.test from library(tseries) and ncvTest from 
>>>library(car) ???
>>>
>>>
>>>
>>>
>>>
>>>
>>>Pozdrowienia,
>>>
>>>Tomasz Schabek
>>>
>>>
>>>On 30 November 2013 07:42, arun <smartpink...@yahoo.com> wrote:
>>>
>>>Hi,
>>>>The link seems to be not working.  From the description, it looks like:
>>>>set.seed(432)
>>>>dat1 <- as.data.frame(matrix(sample(200,154*337,replace=TRUE),ncol=337))
>>>> colnames(dat1) <- c(paste("F",1:3,sep="."),paste("r",1:334,sep="."))
>>>>lst1 <- lapply(paste("r",1:334,sep="."),function(x) 
>>>>cbind(dat1[,c(1:3)],dat1[x]))
>>>>
>>>> lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} )
>>>>library(zoo)
>>>>
>>>>res <- do.call(rbind,lapply(lst2,function(x) 
>>>>rollapply(x,width=32,FUN=function(z) coef(lm(r~ 
>>>>F.1+F.2+F.3,data=as.data.frame(z))),by.column=FALSE,align="right")))
>>>>
>>>>row.names(res) <- rep(paste("r",1:334,sep="."),each=123)
>>>> dim(res)
>>>>#[1] 41082     4
>>>>
>>>>coef(lm(r.1~F.1+F.2+F.3,data=dat1[1:32,]) )
>>>>#(Intercept)         F.1         F.2         F.3
>>>>#109.9168150  -0.1705361  -0.1028231   0.2027911
>>>>coef(lm(r.1~F.1+F.2+F.3,data=dat1[2:33,]) )
>>>>#(Intercept)         F.1         F.2         F.3
>>>>#119.3718949  -0.1660709  -0.2059830   0.1338608
>>>>res[1:2,]
>>>>#    (Intercept)        F.1        F.2       F.3
>>>>#r.1    109.9168 -0.1705361 -0.1028231 0.2027911
>>>>#r.1    119.3719 -0.1660709 -0.2059830 0.1338608
>>>>
>>>>A.K.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>On Friday, November 29, 2013 6:43 PM, nooldor <nool...@gmail.com> wrote:
>>>>Hi all!
>>>>
>>>>
>>>>I am just starting my adventure with R, so excuse me naive questions.
>>>>
>>>>My data look like that:
>>>>
>>>><http://r.789695.n4.nabble.com/file/n4681391/data_descr_img.jpg>
>>>>
>>>>I have 3 independent variables (F.1, F.2 and F.3) and 334 other variables
>>>>(r.1, r.2, ... r.334) - each one of these will be dependent variable in my
>>>>regression.
>>>>
>>>>Total span of the time is 154 observations. But I would like to have rolling
>>>>window regression with length of 31 observations.
>>>>
>>>>I would like to run script like that:
>>>>
>>>>summary(lm(r.1~F.1+F.2+F.3, data=data))
>>>>vif(lm(r.1~F.1+F.2+F.3, data=data))
>>>>
>>>>But for each of 334 (r.1 to r.334) dependent variables separately and with
>>>>rolling-window of the length 31obs.
>>>>
>>>>Id est:
>>>>summary(lm(r.1~F.1+F.2+F.3, data=data)) would be run 123 (154 total obs -
>>>>31. for the first regression) times for rolling-fixed period of 31 obs.
>>>>
>>>>The next regression would be:
>>>>summary(lm(r.2~F.1+F.2+F.3, data=data)) also 123 times ... and so on till
>>>>summary(lm(r.334~F.1+F.2+F.3, data=data))
>>>>
>>>>It means it would be 123 x 334 regressions (=41082 regressions)
>>>>
>>>>I would like to save results (summary + vif test) of all those 41082
>>>>regressions in one read-user-friendly file like this given by e.g command
>>>>capture.output()
>>>>
>>>>Could you help with it?
>>>>
>>>>Regards,
>>>>
>>>>T.S.
>>>>
>>>>    [[alternative HTML version deleted]]
>>>>
>>>>______________________________________________
>>>>R-help@r-project.org mailing list
>>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>
>>
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Multiple regressions with changing dependent variable and time span

Reply via email to