Hi gavinr, Perhaps this will do what you want. add_HH_MM<-function(x) { t1bits<-strsplit(as.character(x$tst),":") t2bits<-strsplit(as.character(x$tint),":") hours<-as.numeric(lapply(t1bits,"[",1))+cumsum(as.numeric(lapply(t2bits,"[",1))) minutes<-as.numeric(lapply(t1bits,"[",2))+cumsum(as.numeric(lapply(t2bits,"[",2))) next_hour<-minutes > 59 # adjust for running into the next hour minutes[next_hour]<-minutes[next_hour]-60 hours[next_hour]<-hours[next_hour]+1 # adjust for running into the next day hours[hours > 23]<-hours[hours > 23]-24 return(paste(formatC(hours,width=2,flag=0),formatC(minutes,width=2,flag=0),sep=":")) }
df$tarr<-unlist(by(df,df$rtid,add_HH_MM)) Jim On Tue, May 26, 2015 at 5:28 AM, gavinr <g.ru...@bham.ac.uk> wrote: > I’ve got some transit data relating to bus stops for a GIS data set. Each > row represents one stop on a route. For each record I have the start time > of the route, a sequence in which a bus stops, the time the bus arrives at > the first stop and the time taken to get to each of the stops from the last > one in the sequence. Not all sequences of stops starts with the number 1, > some may start with a higher number. > I need to make a new variable which has the time the bus arrives at each > stop by using the start time from the stop with the lowest sequence number, > to populate all of the arrival times for each stop in each route. > > I have a very simple example below with just three routes and a few stops in > each. My actual data set has a few million rows. I've also created a > version of the data set I'm aiming to get. > > There are two problems here. Firstly getting the data into the correct > format to do the calculations with > durations, and secondly running a function over the data set to obtain the > times. > It is the durations that are critical not the date, so using the POSIX > methods doesn’t really seem appropriate here. Ultimately the times are > going to be used in a route solver in an ArcSDE geodatabase. I tried to use > strptime to format my times, but could not get them into a data.frame as > presumably they are a list. In this example I’ve left them as strings. > > Any help is much appreciated. > > #create four columns with route id, stop sequence interval time and route > start time > ssq<-c(3,4,5,6,7,8,9,1,2,3,4,2,3,4,5,6,7,8) > tint<-c("00:00","00:12","00:03","00:06","00:09","00:02","00:04","00:00","00:08","00:10","00:10","00:00","00:02","00:04","00:08","00:02","00:01","00:01") > tst<-c(rep("18:20",7),rep("10:50",4),rep("16:15",7)) > rtid<-c(rep("a",7),rep("b",4),rep("c",7)) > df<-data.frame(cbind(ssq,tint,tst,rtid)) > df > > #correct data set should look like this > tarr<-c("18:20","18:32","18:35","18:41","18:50","18:52","18:56","10:50","10:58","11:08","11:18","16:15","16:17","16:21","16:29","16:31","16:32","16:33") > df2<-cbind(df,tarr) > df2 > > > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/run-a-calculation-function-over-time-fields-ordered-and-grouped-by-variables-tp4707655.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.