Hi Carlos, Here is one option:
## read in your data dat <- read.table(textConnection(" obs unit home z sex age 1 015029 18 1 1 053 2 015029 18 1 2 049 3 015029 01 1 1 038 4 015029 01 1 2 033 5 015029 02 1 1 036 6 015029 02 1 2 033 7 015029 03 1 1 023 8 015029 03 1 2 019 9 015029 04 1 2 045 10 015029 05 1 2 047"), header = TRUE, stringsAsFactors = FALSE) closeAllConnections() ## create a unique ID for matching unit and home dat$mID <- with(dat, paste(unit, home, sep = '')) ## somewhat messy way of creating a couple number ## for each mID, if there is more than 1 row, and more than 1 sex ## it creates a couple id, otherwise 0 i <- 0L dat$couple <- with(dat, unlist(lapply(split(sex, mID), function(x) { i <<- i + 1L if (length(x) > 1 && length(unique(x)) > 1) { rep(i, length(x)) } else 0L }))) ## view results dat obs unit home z sex age mID couple 1 1 15029 18 1 1 53 1502918 1 2 2 15029 18 1 2 49 1502918 1 3 3 15029 1 1 1 38 150291 2 4 4 15029 1 1 2 33 150291 2 5 5 15029 2 1 1 36 150292 3 6 6 15029 2 1 2 33 150292 3 7 7 15029 3 1 1 23 150293 4 8 8 15029 3 1 2 19 150293 4 9 9 15029 4 1 2 45 150294 0 10 10 15029 5 1 2 47 150295 0 See these functions for more details: ?ave # where I got my idea ?split ?lapply ?`<<-` Cheers, Josh On Sat, Nov 12, 2011 at 8:16 PM, jour4life <jour4l...@gmail.com> wrote: > Hi all, > > I've searched everywhere to try to find out how to do this and have had no > luck. I am trying to construct identifiers for couples in a dataset. > Essentially, I want to identify couples using more than one column as > identifiers. Take for instance: > > obs unit home z sex age > 1 015029 18 1 1 053 > 2 015029 18 1 2 049 > 3 015029 01 1 1 038 > 4 015029 01 1 2 033 > 5 015029 02 1 1 036 > 6 015029 02 1 2 033 > 7 015029 03 1 1 023 > 8 015029 03 1 2 019 > 9 015029 04 1 2 045 > 10 015029 05 1 2 047 > > Where unit is the housing unit, home is household. Of course, there are more > values for unit, although these first ten observations consist of the same > unit (which could possibly be an apartment complex). Nonetheless, I want to > construct an identifier for couples if unit, home match, but only if both > male and female are within the same household. Taking the example data > above, I want to see this: > > unit home z sex age couple > 1 015029 18 1 1 053 1 > 2 015029 18 1 2 049 1 > 3 015029 01 1 1 038 2 > 4 015029 01 1 2 033 2 > 5 015029 02 1 1 036 3 > 6 015029 02 1 2 033 3 > 7 015029 03 1 1 023 4 > 8 015029 03 1 2 019 4 > 9 015029 04 1 2 045 0 > 10 015029 05 1 2 047 0 > > As you can see in the last two observations, there were no males identified > within the same household, thus the last two observations would not contain > couple identifiers, rather some other identifier (but the same one) so I can > detect them and remove them later. I've tried using the duplicated function > but was not very useful. > > Any help would be greatly appreciated!!! > > Thanks, > > Carlos > > -- > View this message in context: > http://r.789695.n4.nabble.com/identify-duplicate-from-more-than-one-column-tp4035888p4035888.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.