On Mon, 19 Sep 2011, Marc Schwartz wrote:
Let me start by acknowledging that I have little practical experience in time series analyses, much less proficiency with the zoo package. I just don't come across them much in clinical trials/studies, at least the ones that I have been involved with over the past 25+ years.
Marc, A lot of folks on the mail list here seem to be in the medical side of biology. I'm a stream ecologist/fluvial geomorphologist with 30 or so years of professional experience and each project seems to need new software and increases in my knowledge to address. I now have two projects involving water quality that will involve integration of time series analyses, regression analyses, and spatial modeling of terrain and hydrology. The last component I've used frequently the past decade or so, the advanced data analyses and statistical modeling has come up only now.
I do know from prior posts on the matter, that the zoo package seems to have some of its own approaches to dealing with dates, as compared to base R. So you may need to be clear on the differentiation in code/functions required to use some of the package functionality.
Yes, I can specify the start and end dates using as.Date.
So from a analytic perspective, I would encourage others to chime in with guidance. Missing data generally has an impact at some level, the extent of which is going to be specific to the context of the particular analysis being performed and any assumptions one may be willing to make.
Missing data between the first sample and the most current one means, in my contexts, that access to the site was not possible by high water, deep snow (some sites at > 7,000 feet amsl), or a dry channel in the late summer. It's ignoring the NAs prior to the first collected samples that I'm hoping can easily be specified.
There is also the r-sig-finance list: https://stat.ethz.ch/mailman/listinfo/r-sig-finance to which this query may be better suited in terms of gaining a focused audience in a domain where time series analyses are prevalent.
I've read some finance/economics-focused time series documents and I haven't seen the relevance. For example, in the natural environment can we assume that water samples collected 1 month apart and analyzed for specific chemical concentrations are autocorrelated? If events such as rain-on-snow or wildland fires cause a large increase in discharge or clear riparian vegetation and add soot and chared debris to the stream channel are chemical concentrations associated with prior ones or to the external influences at the collection site? Perhaps these data are independent and identically distributed (iid). One of the more interesting (to me, at least) aspects of one of these projects is to explore the value of the time domain approach for predicting future values versus the frequency approach to explore periodic and/or systematic variations in values over time. Regulators tend to focus on the first and be unaware of the second. At this very early exploratory stage I'm not sure which approach is more beneficial to my client and the regulators.
There are also some books on using R for time series analyses, some of which are listed on the "Books" link from the R homepage. It would seem logical that one or more of them might cover the use of the zoo package, but that is a guess on my part.
I am plowing my way through Sumnway and Stoffer's "Time Series Analysis and Its Applications with R Examples" and have read Cowpertwait and Metcalf's "Introductory Time Series with R." I need to look again at Zuur et al. in both "Analyzing Ecological Data" and "Numerical Ecoogy with R" specifically for discussions of time series. The books and other documents I've read (with the exception of an article on sandbars in the Colorado River) are in situations where data are associated with fixed and regular periods. In the messy real world not only do weather and other conditions mean irregular data collection dates, but sometimes the regulators decide that monthly or quarterly samples are no longer requred so semi-annual samples are the norm thereafter. Biotic data are even worse. :-) Zoo seems to be ideal for the irregular, messy data with which I work. Since I'm quite new to R it will take me time to get up to full speed with it and zoo. I greatly appreciate the patience and understanding of all of you who've helped.
I hope that the above is helpful Rich.
Yep. If you know of references to time series analyses of real-world, messy data, please share them with me.
I also presume that you got my "final" version of the two functions, with the corrected data frame based approach. Sorry for the confusion on that earlier.
Yes, I did, and there was no confusion as I read them all in the same session. Again, many thanks, Rich ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.