I'm comparing a variety of datasets with over 4M rows. I've solved this problem 5 different ways using a for/while loop but the processing time is murder (over 8 hours doing this row by row per data set). As such I'm trying to find whether this solution is possible without a loop or one in which the processing time is much faster.
Each dataset is a time series as such: DF1: X.DATE X.TIME VALUE VALUE2 1 01052007 0200 37 29 2 01052007 0300 42 24 3 01052007 0400 45 28 4 01052007 0500 45 27 5 01052007 0700 45 35 6 01052007 0800 42 32 7 01052007 0900 45 32 . . . n DF2 X.DATE X.TIME VALUE VALUE2 1 01052007 0200 37 29 2 01052007 0300 42 24 3 01052007 0400 45 28 4 01052007 0500 45 27 5 01052007 0600 45 35 6 01052007 0700 42 32 7 01052007 0800 45 32 . . n+4000 In other words there are 4000 more rows in DF2 then DF1 thus the datasets are of unequal length. I'm trying to ensure that all dataframes have the same number of X.DATE and X.TIME entries. Where they are missing, I'd like to insert a new row. In the above example, when comparing DF2 to DF1, entry 01052007 0600 entry is missing in DF1. The solution would add a row to DF1 at the appropriate index. so new dataframe would be X.DATE X.TIME VALUE VALUE2 1 01052007 0200 37 29 2 01052007 0300 42 24 3 01052007 0400 45 28 4 01052007 0500 45 27 5 01052007 0600 45 27 6 01052007 0700 45 35 7 01052007 0800 42 32 8 01052007 0900 45 32 Value and Value2 would be the same as row 4. Of course this is simple to accomplish using a row by row analysis but with of 4M rows the processing time destroying and rebinding the datasets is very time consuming and I believe highly un-R'ish. What am I missing? Thanks! [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.