Re: [R] Multiple regressions with changing dependent variable and time span

nooldor Sat, 30 Nov 2013 14:04:54 -0800

Hey!

Yes,
only the D-W test takes so much time, did not check it yet
I checked results (estimates) with manually run regressions (in excel) and
they are correct.


I only change the "width" to 31 and "each=123" to 124, cause it should be
((154-31)+1) x 334 = 41416 matrix

with the lag in D-W test I was wondering how to have table when I use
durbinWatsonTest(l1*,3*) - with three lags instead of default 1.
but I can manage it - just need to learn about functions used by you.

Any way: BIG THANK to you!

Best wishes,
T.S.



On 30 November 2013 21:12, arun <smartpink...@yahoo.com> wrote:

> Hi,
>
> I was able to read the file after saving it as .csv.  It seems to work
> without any errors.
>
> dat1<-read.csv("Book2.csv", header=T)
> ###same as previous
>
> lst1 <- lapply(paste("r",1:334,sep="."),function(x)
> cbind(dat1[,c(1:3)],dat1[x]))
> lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} )
>  sapply(lst2,function(x) sum(!!rowSums(is.na(x))))
> library(zoo)
> res1 <- do.call(rbind,lapply(lst2,function(x)
> rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z);
> if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1);
> c(coef(l1), pval=summary(l1)$coef[,4], rsquare=summary(l1)$r.squared) }
> else rep(NA,9)},by.column=FALSE,align="right")))
> row.names(res1) <- rep(paste("r",1:334,sep="."),each=123)
>  dim(res1)
> #[1] 41082     9
>
> #vif
>  library(car)
> res2 <- do.call(rbind,lapply(lst2,function(x)
> rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z);
> if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); vif(l1) }
> else rep(NA,3)},by.column=FALSE,align="right")))
> row.names(res2) <- rep(paste("r",1:334,sep="."),each=123)
> dim(res2)
> #[1] 41082     3
>
> #DW statistic:
>  lst3 <- lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z) {z1
> <- as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1
> <-lm(r~F.1+F.2+F.3,data=z1); durbinWatsonTest(l1) } else
> rep(NA,4)},by.column=FALSE,align="right"))
>  res3 <- do.call(rbind,lapply(lst3,function(x) x[,-4]))
> row.names(res3) <- rep(paste("r",1:334,sep="."),each=123)
>  dim(res3)
> #[1] 41082     3
> ##ncvTest()
> f4 <- function(meanmod, dta, varmod) {
> assign(".dta", dta, envir=.GlobalEnv)
> assign(".meanmod", meanmod, envir=.GlobalEnv)
> m1 <- lm(.meanmod, .dta)
> ans <- ncvTest(m1, varmod)
> remove(".dta", envir=.GlobalEnv)
> remove(".meanmod", envir=.GlobalEnv)
> ans
> }
>
>  lst4 <- lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z) {z1
> <- as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1 <-f4(r~.,z1) }
> else NA},by.column=FALSE,align="right"))
> names(lst4) <- paste("r",1:334,sep=".")
> length(lst4)
> #[1] 334
>
>
> ###jarque.bera.test
> library(tseries)
> res5 <- do.call(rbind,lapply(lst2,function(x)
> rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z);
> if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); resid <-
> residuals(l1); unlist(jarque.bera.test(resid)[1:3]) } else
> rep(NA,3)},by.column=FALSE,align="right")))
>  dim(res5)
> #[1] 41082     3
>
> A.K.
>
>
>
>
>
>
>
> On Saturday, November 30, 2013 1:44 PM, nooldor <nool...@gmail.com> wrote:
>
> here is in .xlsx should be easy to open and eventually find&replace commas
> according to you excel settings (or maybe it will do it automatically)
>
>
>
>
>
>
> On 30 November 2013 19:15, arun <smartpink...@yahoo.com> wrote:
>
> I tried that, but:
> >
> >
> >
> >dat1<-read.table("Book2.csv", head=T, sep=";", dec=",")
> >> str(dat1)
> >'data.frame':    154 obs. of  1 variable:
> >
> >Then I changed to:
> >dat1<-read.table("Book2.csv", head=T, sep="\t", dec=",")
> >> str(dat1)
> >'data.frame':    154 obs. of  661 variables:
> >Both of them are wrong as the number of variables should be 337.
> >A.K.
> >
> >
> >
> >
> >
> >
> >
> >On Saturday, November 30, 2013 12:53 PM, nooldor <nool...@gmail.com>
> wrote:
> >
> >Thank you,
> >
> >I got your reply. I am just testing your script. I will let you know how
> is it soon.
> >
> >.csv could be problematic as commas are used as dec separator (Eastern
> Europe excel settings) ... I read it in R with this:
> >dat1<-read.table("Book2.csv", head=T, sep=";", dec=",")
> >
> >Thank you very much !!!
> >
> >T.S.
> >
> >
> >
> >
> >On 30 November 2013 18:39, arun <smartpink...@yahoo.com> wrote:
> >
> >I couldn't read the "Book.csv" as the format is completely messed up.
> Anyway, I hope the solution works on your dataset.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>On Saturday, November 30, 2013 10:34 AM, nooldor <nool...@gmail.com>
> wrote:
> >>
> >>
> >>ok.
> >>
> >>
> >>> dat1<-read.table("Book2.csv", head=T, sep=";", dec=",") >
> colnames(dat1) <- c(paste("F",1:3,sep="."),paste("r",1:2,sep=".")) > lst1
> <- lapply(paste("r",1:2,sep="."),function(x) cbind(dat1[,c(1:3)],dat1[x]))
> > lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} ) >
> sum(!!rowSums(is.na(lst2[[1]]))) [1] 57 > #[1] 40 >
> sapply(lst2,function(x) sum(!!rowSums(is.na(x)))) [1] 57  0 > #[1] 40 46
> >>in att you have the data file
> >>
> >>
> >>
> >>
> >>
> >>
> >>On 30 November 2013 16:22, arun <smartpink...@yahoo.com> wrote:
> >>
> >>Hi,
> >>>The first point is not that clear.
> >>>
> >>>Could you show the expected results in this case?
> >>>
> >>>set.seed(432)
> >>>dat1 <-
> as.data.frame(matrix(sample(c(1:10,NA),154*5,replace=TRUE),ncol=5))
> >>> colnames(dat1) <- c(paste("F",1:3,sep="."),paste("r",1:2,sep="."))
> >>>lst1 <- lapply(paste("r",1:2,sep="."),function(x)
> cbind(dat1[,c(1:3)],dat1[x]))
> >>>
> >>>
> >>> lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} )
> >>> sum(!!rowSums(is.na(lst2[[1]])))
> >>>#[1] 40
> >>> sapply(lst2,function(x) sum(!!rowSums(is.na(x))))
> >>>#[1] 40 46
> >>>
> >>>
> >>>A.K.
> >>>
> >>>
> >>>
> >>>On Saturday, November 30, 2013 10:09 AM, nooldor <nool...@gmail.com>
> wrote:
> >>>
> >>>Hi,
> >>>
> >>>Thanks for reply!
> >>>
> >>>
> >>>Three things:
> >>>1.
> >>>I did not write that some of the data has more then 31 NA in the column
> and then it is not possible to run lm()
> >>>
> >>>Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...)
> :  0 (non-NA) casesIn this case program should return "NA" symbol and go
> further, in the case when length of the observations is shorter then 31
> program should always return "NA" but go further .
> >>>
> >>>
> >>>
> >>>2. in your result matrix there are only 4 columns (for estimates of the
> coefficients), is it possible to put there 4 more columns with p-values and
> one column with R squared
> >>>
> >>>
> >>>3. basic statistical test for the regressions:
> >>>
> >>>inflation factors can be captured by:
> >>>res2 <- do.call(rbind,lapply(lst2,function(x)
> rollapply(x,width=32,FUN=function(z)
> >>>  vif(lm(r~
> F.1+F.2+F.3,data=as.data.frame(z))),by.column=FALSE,align="right")))
> >>>
> >>>and DW statistic:
> >>>res3 <- do.call(rbind,lapply(lst2,function(x)
> rollapply(x,width=32,FUN=function(z)
> >>>  durbinWatsonTest(lm(r~
> F.1+F.2+F.3,data=as.data.frame(z))),by.column=FALSE,align="right")))
> >>>
> >>>
> >>>3a)is that right?
> >>>
> >>>3b) how to do and have in user-friendly form durbinWatsonTest for more
> then 1 lag?
> >>>
> >>>3c) how to apply: jarque.bera.test from library(tseries) and ncvTest
> from library(car) ???
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>Pozdrowienia,
> >>>
> >>>Tomasz Schabek
> >>>
> >>>
> >>>On 30 November 2013 07:42, arun <smartpink...@yahoo.com> wrote:
> >>>
> >>>Hi,
> >>>>The link seems to be not working.  From the description, it looks like:
> >>>>set.seed(432)
> >>>>dat1 <-
> as.data.frame(matrix(sample(200,154*337,replace=TRUE),ncol=337))
> >>>> colnames(dat1) <- c(paste("F",1:3,sep="."),paste("r",1:334,sep="."))
> >>>>lst1 <- lapply(paste("r",1:334,sep="."),function(x)
> cbind(dat1[,c(1:3)],dat1[x]))
> >>>>
> >>>> lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} )
> >>>>library(zoo)
> >>>>
> >>>>res <- do.call(rbind,lapply(lst2,function(x)
> rollapply(x,width=32,FUN=function(z) coef(lm(r~
> F.1+F.2+F.3,data=as.data.frame(z))),by.column=FALSE,align="right")))
> >>>>
> >>>>row.names(res) <- rep(paste("r",1:334,sep="."),each=123)
> >>>> dim(res)
> >>>>#[1] 41082     4
> >>>>
> >>>>coef(lm(r.1~F.1+F.2+F.3,data=dat1[1:32,]) )
> >>>>#(Intercept)         F.1         F.2         F.3
> >>>>#109.9168150  -0.1705361  -0.1028231   0.2027911
> >>>>coef(lm(r.1~F.1+F.2+F.3,data=dat1[2:33,]) )
> >>>>#(Intercept)         F.1         F.2         F.3
> >>>>#119.3718949  -0.1660709  -0.2059830   0.1338608
> >>>>res[1:2,]
> >>>>#    (Intercept)        F.1        F.2       F.3
> >>>>#r.1    109.9168 -0.1705361 -0.1028231 0.2027911
> >>>>#r.1    119.3719 -0.1660709 -0.2059830 0.1338608
> >>>>
> >>>>A.K.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>On Friday, November 29, 2013 6:43 PM, nooldor <nool...@gmail.com>
> wrote:
> >>>>Hi all!
> >>>>
> >>>>
> >>>>I am just starting my adventure with R, so excuse me naive questions.
> >>>>
> >>>>My data look like that:
> >>>>
> >>>><http://r.789695.n4.nabble.com/file/n4681391/data_descr_img.jpg>
> >>>>
> >>>>I have 3 independent variables (F.1, F.2 and F.3) and 334 other
> variables
> >>>>(r.1, r.2, ... r.334) - each one of these will be dependent variable
> in my
> >>>>regression.
> >>>>
> >>>>Total span of the time is 154 observations. But I would like to have
> rolling
> >>>>window regression with length of 31 observations.
> >>>>
> >>>>I would like to run script like that:
> >>>>
> >>>>summary(lm(r.1~F.1+F.2+F.3, data=data))
> >>>>vif(lm(r.1~F.1+F.2+F.3, data=data))
> >>>>
> >>>>But for each of 334 (r.1 to r.334) dependent variables separately and
> with
> >>>>rolling-window of the length 31obs.
> >>>>
> >>>>Id est:
> >>>>summary(lm(r.1~F.1+F.2+F.3, data=data)) would be run 123 (154 total
> obs -
> >>>>31. for the first regression) times for rolling-fixed period of 31 obs.
> >>>>
> >>>>The next regression would be:
> >>>>summary(lm(r.2~F.1+F.2+F.3, data=data)) also 123 times ... and so on
> till
> >>>>summary(lm(r.334~F.1+F.2+F.3, data=data))
> >>>>
> >>>>It means it would be 123 x 334 regressions (=41082 regressions)
> >>>>
> >>>>I would like to save results (summary + vif test) of all those 41082
> >>>>regressions in one read-user-friendly file like this given by e.g
> command
> >>>>capture.output()
> >>>>
> >>>>Could you help with it?
> >>>>
> >>>>Regards,
> >>>>
> >>>>T.S.
> >>>>
> >>>>    [[alternative HTML version deleted]]
> >>>>
> >>>>______________________________________________
> >>>>R-help@r-project.org mailing list
> >>>>https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >>>>and provide commented, minimal, self-contained, reproducible code.
> >>>>
> >>>>
> >>>
> >>
> >
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Multiple regressions with changing dependent variable and time span

Reply via email to