Re: [R] run a calculation function over time fields, ordered and grouped by variables

jdnewmil Mon, 25 May 2015 17:06:47 -0700

Another way:

#create four columns with route id, stop sequence interval time androute start time

ssq <- c( 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 2, 3, 4, 5, 6, 7, 8 )
tint <- c( "00:00", "00:12", "00:03", "00:06", "00:09", "00:02", "00:04"
         , "00:00", "00:08", "00:10", "00:10"

, "00:00", "00:02", "00:04", "00:08", "00:02", "00:01", "00:01")

tst <- c( rep( "18:20", 7 )
        , rep( "10:50", 4 )
        , rep( "16:15", 7 ) )
rtid <- c( rep( "a", 7 )
         , rep( "b", 4 )
         , rep( "c", 7 ) )
# Don't use cbind to make data frames... it usually ends up
# forcing all columns to be character or factors
# Also, avoid using "df" as a variable name... it is the name of
# a function in base R, so that gets confusing fast
DF <- data.frame( ssq, tint, tst, rtid, stringsAsFactors=FALSE )
DF


#correct data set should look like this
tarr <- c( "18:20", "18:32", "18:35", "18:41", "18:50", "18:52", "18:56"
         , "10:50", "10:58", "11:08", "11:18"

, "16:15", "16:17", "16:21", "16:29", "16:31", "16:32", "16:33")

DF2  <- data.frame( DF, tarr, stringsAsFactors=FALSE )
DF2

library(dplyr)
DFs <- (   DF
       %>% group_by( rtid )
       %>% mutate( tarr

= as.character( as.POSIXct( tst, format="%H:%M", tz="GMT")

                            + as.difftime(
                                cumsum(
                                  as.numeric(

as.difftime( tint, format="%H:%M")

                                    , units="mins"
                                    )
                                  )
                                , units="mins"
                                )
                          , format="%H:%M" )
                 )
       %>% as.data.frame # removes grouping behavior from result
       )
identical( DFs, DF2 )

On 2015-05-25 15:43, Jim Lemon wrote:

Hi gavinr,
Perhaps this will do what you want.

add_HH_MM<-function(x) {
 t1bits<-strsplit(as.character(x$tst),":")
 t2bits<-strsplit(as.character(x$tint),":")

hours<-as.numeric(lapply(t1bits,"[",1))+cumsum(as.numeric(lapply(t2bits,"[",1)))

minutes<-as.numeric(lapply(t1bits,"[",2))+cumsum(as.numeric(lapply(t2bits,"[",2)))
 next_hour<-minutes > 59
 # adjust for running into the next hour
 minutes[next_hour]<-minutes[next_hour]-60
 hours[next_hour]<-hours[next_hour]+1
 # adjust for running into the next day
 hours[hours > 23]<-hours[hours > 23]-24

return(paste(formatC(hours,width=2,flag=0),formatC(minutes,width=2,flag=0),sep=":"))
}

df$tarr<-unlist(by(df,df$rtid,add_HH_MM))

Jim


On Tue, May 26, 2015 at 5:28 AM, gavinr <g.ru...@bham.ac.uk> wrote:
I’ve got some transit data relating to bus stops for a GIS data set.Eachrow represents one stop on a route. For each record I have the starttimeof the route, a sequence in which a bus stops, the time the busarrives atthe first stop and the time taken to get to each of the stops from thelastone in the sequence. Not all sequences of stops starts with thenumber 1,
some may start with a higher number.
I need to make a new variable which has the time the bus arrives ateachstop by using the start time from the stop with the lowest sequencenumber,
to populate all of the arrival times for each stop in each route.
I have a very simple example below with just three routes and a fewstops in
each.  My actual data set has a few million rows.  I've also created a
version of the data set I'm aiming to get.
There are two problems here. Firstly getting the data into thecorrect
format to do the calculations with
durations, and secondly running a function over the data set to obtainthe
times.
It is the durations that are critical not the date, so using the POSIX
methods doesn’t really seem appropriate here. Ultimately the timesaregoing to be used in a route solver in an ArcSDE geodatabase. I triedto usestrptime to format my times, but could not get them into a data.frameaspresumably they are a list. In this example I’ve left them asstrings.
Any help is much appreciated.
#create four columns with route id, stop sequence interval time androute
start time
ssq<-c(3,4,5,6,7,8,9,1,2,3,4,2,3,4,5,6,7,8)
tint<-c("00:00","00:12","00:03","00:06","00:09","00:02","00:04","00:00","00:08","00:10","00:10","00:00","00:02","00:04","00:08","00:02","00:01","00:01")
tst<-c(rep("18:20",7),rep("10:50",4),rep("16:15",7))
rtid<-c(rep("a",7),rep("b",4),rep("c",7))
df<-data.frame(cbind(ssq,tint,tst,rtid))
df

#correct data set should look like this
tarr<-c("18:20","18:32","18:35","18:41","18:50","18:52","18:56","10:50","10:58","11:08","11:18","16:15","16:17","16:21","16:29","16:31","16:32","16:33")
df2<-cbind(df,tarr)
df2





--
View this message in context:http://r.789695.n4.nabble.com/run-a-calculation-function-over-time-fields-ordered-and-grouped-by-variables-tp4707655.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] run a calculation function over time fields, ordered and grouped by variables

Reply via email to