I have two datasets:
A with columns Open and Name (and many others, irrelevant to the merge)
B with columns Time and Name (and many others, irrelevant to the merge)

I want the dataset AB with all these columns
Open from A - a difftime (time of day)
Time from B - a difftime (time of day)
Name (same in A & B) - a factor, does NOT index rows, i.e., there are
_many_ rows in both A & B with the same Name.
all the other columns from A & B.

Each row in AB must come from exactly one row in A.
(i.e., dim(AB)[1] == dim(A)[1]).

For each row in AB, Open>=Time, and "as small as possible".

The above conditions uniquely define AB.

The "obvious algorithm" is: for each row in A search B for a row
with the same Name and the largest Time <= Open.

However, I don't see an easy way to do it in R.
The obvious intermediary step is

AB1 <- merge(A, B, all.x = TRUE, all.y = FALSE, by = 'Name')

Now, AB1 has many rows with the same Name and Open.
I need to drop all of them except for the one with the largest Time <= Open.
I can do

AB2 <- AB1[which(AB1$Time <= AB1$Open),]

Now I need to keep just _one_ row with the same Name & Open - and the
largest Time.

How do I do that?

unique() seems to have the right name, but I don't see how it can help me...

tia.

-- 
Sam Steingold (http://sds.podval.org/) on CentOS release 5.6 (Final) X 
11.0.60900031
http://jihadwatch.org http://honestreporting.com
http://ffii.org http://camera.org http://thereligionofpeace.com
UNIX is a way of thinking.  Windows is a way of not thinking.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to