Homework? The list has a no homework policy - but perhaps I'll be forgiven por posting hints. In general terms, this is how I appraoched the problem: * Loop through the rows of stop_onoff - for (idx in ...someething...) {... * For each row, find the first of "ref" in a suitably filtered subset of stop_sequence, and keep track of these row numbers * Update columns "on" and "off" * Use cumsum to calculate the number of passengers on the bus
Note the loop. Someone cleverer than I might be able to vectorise that step, but I couldn't see how. By the way, if this is homework... Are you sure you're desired_output is correct? I would expect someething like seq ref on off load 1 10 A 5 0 5 2 20 B 0 0 5 3 30 C 0 0 5 4 40 D 0 2 3 5 50 B 10 2 11 6 60 A 0 6 5 Are you aware that you're "ref" ccolumns are factors and not characters? If you use "stringsAsFactors = FALSE" or stop_onoff <- data.frame(ref=factor(c('A','D','B','A'), levels = levels(stop_sequence$ref)),on=c(5,0,10,0),off=c(0,2,2,6)) it will simplify your'e analysis (or at least reduce some typing). Type the following in an R console ?data.frame ?factor and have a read. Now, if this ain't homework, or you just want someone to do it for you, e-mail me offline and I'll send you my appraoch. If it is homework, let me know - I'm happy to help anyway, but I will be trying to help you solve this for yourself. Cheers, DMcP On Sat, 30 Aug 2014 12:46:17 +1200 Adam Lawrence <alaw...@gmail.com> wrote > I am hoping someone can help me with a bus stop sequencing problem in R, > where I need to match counts of people getting on and off a bus to the > correct stop in the bus route stop sequence. I have tried looking > online/forums for sequence matching but seems to refer to numeric sequences > or DNA matching and over my head. I am after a simple example if anyone can > please help. > > I have two data series as per below (from database), that I want to > combine. In this example “stop_sequence” includes the equence (seq) of bus > stops and “stop_onoff” is a count of people getting on and off at certain > stops (there is no entry if noone gets on or off). > > stop_sequence <- data.frame(seq=c(10,20,30,40,50,60), > ref=c('A','B','C','D','B','A')) > ## seq ref > ## 1 10 A > ## 2 20 B > ## 3 30 C > ## 4 40 D > ## 5 50 B > ## 6 60 A > stop_onoff <- > data.frame(ref=c('A','D','B','A'),on=c(5,0,10,0),off=c(0,2,2,6)) > ## ref on off > ## 1 A 5 0 > ## 2 D 0 2 > ## 3 B 10 2 > ## 4 A 0 6 > > I need to match the stop_onoff numbers in the right sto sequence, with the > correctly matched output as follows (load is a cumulative count of on and > off) > > desired_output <- data.frame(seq=c(10,20,30,40,50,60), > ref=c('A','B','C','D','B','A'), > on=c(5,'-','-',0,10,0),off=c(0,'-','-',2,2,6), load=c(5,0,0,3,11,5)) > ## seq ref on off load > ## 1 10 A 5 0 5 > ## 2 20 B - - 0 > ## 3 30 C - - 0 > ## 4 40 D 0 2 3 > ## 5 50 B 10 2 11 > ## 6 60 A 0 6 5 > > In this example the stop “B” is matched to the second stop “B” in the stop > sequence and not the first because the onoff data is after stop “D”. > > Any guidance much appreciated. > > Regards > Adam > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ____________________________________________________________ South Africas premier free email service - www.webmail.co.za Cheapest Insurance Quotes! https://www.outsurance.co.za/insurance-quote/personal/?source=msn&cr=Postit14_468x60_gif&cid=322 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.