[R] How to run regressions over increasing time series

Philippe Hensel Fri, 27 Jul 2012 07:33:26 -0700

Hello,

I would like to run a series of regressions on my data (response variable
over time):


1) regression from T1 to T2
2) regressions from T1 through T3
3) regression from T1 through T4, etc.

I have been struggling to find a way to do this through commands, as
opposed to cutting up the data manually (my dataset has over 6000
rows/observations).

An illustrative dataset can be created thusly:

dat <- structure(list(Years= c(0, 0, 0, 0.36, 0.36, 0.36, 0.67, 0.67,
0.67, 0.74, 0.74, 0.74),
Obs = c(0, 0, 0, 2.3, 1.9, 2.1, 4.5, 4.5, 4.6, 5.3, 5.5, 5.6)),
.Names = c("Years","Obs"), row.names = c(NA, -12L), class = "data.frame")

I was trying to use a loop to create subsets of the data corresponding to
the sets of time intervals required (e.g. T1 to T2, T1 through T3, etc.),
but I am having trouble generating a new variable to index time (instead of
the decimal values).  I was figuring that indexing time would allow me to
use a loop to generate the required subsets of data.

I can figure out how many time periods I have and assign a sequential
number to them:

Years <- unique(set.data$Yrs)
Yrs_count <- seq(from = 1, to = length(Years), by = 1)

And then I can combine these into a dataframe:

Yrs_combo <- cbind(Years,Yrs_count)

However, how do I combine this data frame with my larger dataset, which has
different numbers of rows?



But this is just an intermediary step in the process.... Some of you might
suggest an entirely different route.



For now, I can manually create this new time index:

dat2 <- structure(list(Years= c(0, 0, 0, 0.36, 0.36, 0.36, 0.67, 0.67,
0.67, 0.74, 0.74, 0.74),
Obs = c(0, 0, 0, 2.3, 1.9, 2.1, 4.5, 4.5, 4.6, 5.3, 5.5, 5.6),
Yrs_count = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4)),
.Names = c("Years","Obs","Yrs_count"), row.names = c(NA, -12L), class =
"data.frame")


The next question is how can I index temporary files in a loop that I use
for extracting the needed data?  I thought I might need two loops: one to
identify the length of the time series, the other to accumulate the data
from T1 through the identified end point - maybe something like:

for (i in 1:Yrs_count) {
for (j in 1:i) {
             keyj <- dat2[,3]==j
dat2j <- dat2[keyj,]
# here is where I want to create a temporary file to accumulate the
different dat2j's I create in this inside loop
 }
# here is where I want to save the file for future use in my regressions

}


I hope this example is clear enough.  My apologies if it isn't - and I
thank the R community  for any ideas, tips, or directions to information
that might be helpful.
Best,

-Philippe

-- 

Philippe Hensel, PhD

NOAA National Geodetic Survey

NGS ECO <http://www.ngs.noaa.gov/web/science_edu/ecosystems_climate/>

 N/NGS2 SSMC3 #8859

 1315 East-West Hwy
Silver Spring MD 20910
(301) 713 3198 x 137

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to run regressions over increasing time series

Reply via email to