"R. Michael Weylandt" <michael.weyla...@gmail.com> writes: > On Wed, Jul 11, 2012 at 10:05 AM, Russell Bowdrey > <russell.bowd...@justretirement.com> wrote: >> >> Dear all, >> >> This is what I'd like to do (I have an implementation using for >> loops, which I designed before I realised just how slow R is at >> executing them - this process currently takes days to run). >> >> I have a large dataframe containing corporate bond data, columns are: >> BondID >> Date (goes back 5years) >> Var1 >> Var2 >> Term2Maturity >> >> What I want to do is this: >> >> 1) For each bond, at each given date, look back over 1 year and append >> some statistics to each row ( sd(Var1), cor(Var1,Var2) over that year etc) >> > > Look at the TTR package and the various run** functions. Much faster. > >> a. It seems I might be able to use ddply for this, but I can't work >> out how to code the stats function to only look back over one year, >> rather than the full data range >> >> b. For example: dfBondsWithCorr<-ddply(dfBonds, .(BondID), >> transform,corr=cor(Var1,Var2),.progress="text") >> returns a dataframe where for each bond it has same corr for each date >> >> 2) On each date, subset dfBondsWithCorr by certain qualification >> criteria, then to the qualifiers fit a regression through a Var1 and >> Term2Maturity, output the regression as a df of curves (say for each >> date, a curve represented by points every 0.5 years) >> >> a. I can do this pretty efficiently for a single date (and I >> suppose I could wrap that in a function) , but can't quite see how >> to do the filtering and spitting out of curves over multiple dates >> without using for loops >> > > This ones harder. For simple linear regressions, you can solve the > regression analytically (e.g., slope = runCov / runVar and mean > similarly) but doing it for more complicated regressions will pretty > much require a for loop of one sort or another. Can you say what sort > of model you are looking to use? > >> Would appreciate any thoughts, many thanks in advance
I feel like PostgreSQL will do the work better. It has support for basic statistics [1] and you can use window functions [2] to limit the scope for last year only. Then you get your data with RODBC or something. I suspect you have you data in some sort of DB in the first place. Perhaps it has similar features. [1] http://www.postgresql.org/docs/9.1/static/functions-aggregate.html#FUNCTIONS-AGGREGATE-STATISTICS-TABLE [2] http://www.postgresql.org/docs/9.1/interactive/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS -- Mikhail ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.