I have a dataframe with many hundreds of survey records containing: tripid_nu, lineon, MuchOtherData, X1, X2, X3, X4
tripid_nu = identifier for each "Trip" {E.g. for EB MTA-901 leaving the first stop at 07:24} lineon = identifier for each of the stop locations Where X1 is the ratio (for each trip/stop combination) of some target number to the number of records. This would sum to the proper trip total if there were observations for every stop, but there are MANY missing stops (no returned survey) for any trip. X2 is then the ratio of the trip target (sum of X1 targets for each trip) and sum of X1 in the dataframe for that tripid_nu. I will soon receive corrected lineon values and new X1 (and hence trip) targets. I'm trying to figure out how to program the subsequent recalculations in R. I'm thinking that {schematically} it may look something like this: TargetX1Sums <- ##{read an external file of totals for tripid_nu/lineon, How should I format the data?? Dataframe or matrix??}## TargetX2Sums <- xtabs(~tripid_nu, data= TargetX1Sums) InitX1Sums <- xtabs(~tripid_nu+lineon, data=SurveyData) SurveyData$NewX1= ##{some formulation of TargetX1Sums/InitX1Sums, selecting by tripid_nu/lineon }## InitX2Sums <- xtabs(NewX1~tripid_nu, data=SurveyData) SurveyData$NewX2= ##{some formulation of TargetX2Sums/InitX2Sums selecting by tripid_nu }## Am I thinking of this correctly? What data structures/classes should I be using? What is the syntax for the type of calculation I'm trying to do? How should I specify my new X1 target data file so I can use it efficiently? Does this make any sense? Thanks in advance, ******************************** PS My preliminary dataframe subtotals look something like this: > InitX1Sums <- xtabs(~tripid_nu+lineon data=SurveyData) > InitX1Sums lineon tripid_nu Warner Center De Soto Pierce College Tampa Reseda Balboa Woodley Sepulveda Van Nuys Woodman Valley College Laurel Canyon North Hollywood 9011880 1 0 2 1 0 2 1 0 0 0 1 0 0 9011890 0 0 0 0 0 0 1 0 0 0 0 1 0 9011960 1 1 2 0 1 1 0 1 3 2 1 0 0 9011970 0 0 0 0 1 0 0 1 6 1 1 1 14 9012040 1 1 1 3 2 7 1 1 1 0 0 0 0 9012050 0 0 0 0 0 0 0 1 1 1 0 0 5 9012130 0 0 0 0 2 0 1 0 1 0 0 2 7 9012280 0 2 10 2 6 2 1 3 1 0 0 1 0 9012290 0 0 1 0 1 1 0 1 2 1 3 3 7 9012720 1 2 3 1 0 0 0 0 2 0 0 0 0 9012730 0 0 1 0 0 0 0 0 0 1 0 1 2 9012760 2 2 5 0 6 2 2 1 2 0 0 0 0 9012770 0 0 0 0 0 0 0 0 0 0 0 3 11 9012840 4 1 2 2 3 3 0 3 1 0 0 0 0 9012850 0 1 0 0 0 0 0 1 1 0 0 0 5 9012880 2 3 0 2 5 0 0 1 0 0 0 0 0 9012890 0 0 0 0 1 0 0 0 1 1 0 1 7 9013000 0 1 2 0 0 2 0 1 0 3 0 1 0 9013010 0 0 0 0 0 1 0 0 1 1 0 0 2 9013240 0 0 1 0 3 0 0 2 0 0 0 0 0 ... and so on ******************************** Robert Farley Metro 1 Gateway Plaza Mail Stop 99-23-7 Los Angeles, CA 90012-2952 Voice: (213)922-2532 Fax: (213)922-2868 www.Metro.net [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.