Hi all. I'm looking for robust ways of building lagged variables in a dataset with multiple individuals.
Consider a dataset with variables like the following: ## set.seed(123) d <- data.frame(id = rep(1:2, each=3), time=rep(1:3, 2), value=rnorm(6)) ## >d id time value 1 1 1 -0.56047565 2 1 2 -0.23017749 3 1 3 1.55870831 4 2 1 0.07050839 5 2 2 0.12928774 6 2 3 1.71506499 I want to compute the lagged variable 'value(t-1)', taking subject id into account. My current effort produced the following: ## my_lag <- function(dt, varname, timevarname='time', lag=1) { vname <- paste(varname, if(lag>0) '.' else '', lag, sep='') timevar <- dt[[timevarname]] dt[[vname]] <- dt[[varname]][match(timevar, timevar + lag)] dt } lag_by <- function(dt, idvarname='id', ...) do.call(rbind, by(dt, dt[[idvarname]], my_lag, ...)) ## With the previous data I get: > lag_by(d, varname='value') id time value value.1 1.1 1 1 -0.56047565 NA 1.2 1 2 -0.23017749 -0.56047565 1.3 1 3 1.55870831 -0.23017749 2.4 2 1 0.07050839 NA 2.5 2 2 0.12928774 0.07050839 2.6 2 3 1.71506499 0.12928774 So that seems working. However, I was thinking if there is a smarter/cleaner/more robust way to do the job. For instance, with the above function I get dataframe rows re-ordering as a side-effect (anyway this is of no concern in my current analysis)... Any suggestion? All the bests, Fabio. -- Antonio, Fabio Di Narzo Ph.D. student at Department of Statistical Sciences University of Bologna, Italy ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel