Hi all,

I've searched everywhere to try to find out how to do this and have had no
luck. I am trying to construct identifiers for couples in a dataset.
Essentially, I want to identify couples using more than one column as
identifiers. Take for instance:

obs     unit            home       z    sex     age
1       015029  18             1        1       053
2       015029  18             1        2       049
3       015029  01             1        1       038
4       015029  01             1        2       033
5       015029  02             1        1       036
6       015029  02             1        2       033
7       015029  03             1        1       023
8       015029  03             1        2       019
9       015029  04             1        2       045
10      015029  05             1        2       047

Where unit is the housing unit, home is household. Of course, there are more
values for unit, although these first ten observations consist of the same
unit (which could possibly be an apartment complex). Nonetheless, I want to
construct an identifier for couples if unit, home match, but only if both
male and female are within the same household. Taking the example data
above, I want to see this:

        unit            home    z       sex     age      couple
1       015029  18             1        1       053      1
2       015029  18             1        2       049      1
3       015029  01             1        1       038      2
4       015029  01             1        2       033      2
5       015029  02             1        1       036      3
6       015029  02             1        2       033      3
7       015029  03             1        1       023      4
8       015029  03             1        2       019      4
9       015029  04             1        2       045      0
10      015029  05             1        2       047      0

As you can see in the last two observations, there were no males identified
within the same household, thus the last two observations would not contain
couple identifiers, rather some other identifier (but the same one) so I can
detect them and remove them later. I've tried using the duplicated function
but was not very useful.

Any help would be greatly appreciated!!! 

Thanks,

Carlos

--
View this message in context: 
http://r.789695.n4.nabble.com/identify-duplicate-from-more-than-one-column-tp4035888p4035888.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to