Highly appreciate for all the help. I have one more thing to resolve..
Suppose 3 additional records are binded to the previous arbitrary data set. i.e > a <- data.frame(id=c(c("A1","A2","A3","A4","A5"),c("A3","A2","A3","A4","A5")),loc=c("B1","B2","B3","B4","B5"),clm=c(rep(("General"),6),rep("Life",4))) > b <- data.frame(id=c("A3","A3","A4"),loc=c("B3","B3","B4"),clm=rep("General",3)) > dat <- rbind(a,b) > dat id loc clm 1 A1 B1 General 2 A2 B2 General 3 A3 B3 General 4 A4 B4 General 5 A5 B5 General 6 A3 B1 General 7 A2 B2 Life 8 A3 B3 Life 9 A4 B4 Life 10 A5 B5 Life 11 A3 B3 General 12 A3 B3 General 13 A4 B4 General The records with row number 3, 11 & 12 and records with row number 4 & 13 are identical. id loc clm id loc clm 3 A3 B3 General 4 A4 B4 General 11 A3 B3 General 13 A4 B4 General 12 A3 B3 General The provided solutions does not perform 1 to 1 matching. (i.e all the matching duplicated records are removed..) The desired output is: id loc clm 1 A1 B1 General 6 A3 B1 General 11 A3 B3 General 12 A3 B3 General 13 A4 B4 General Are there solution to this problem with 'merging' function or other alternative method? Thanks Steven On Thu, Oct 29, 2009 at 10:30 PM, Adaikalavan Ramasamy < a.ramas...@imperial.ac.uk> wrote: > Here is another way based on pasting ids as hinted below: > > > a <- data.frame(id=c(c("A1","A2","A3","A4","A5"), > c("A3","A2","A3","A4","A5")), > loc=c("B1","B2","B3","B4","B5"), > clm=c(rep(("General"),6),rep("Life",4))) > > a$uid <- paste(a$id, ".", a$loc, sep="") > > out <- tapply( a$clm, a$uid, paste ) # can also add collapse="," > $A1.B1 > [1] "General" > > $A2.B2 > [1] "General" "Life" > > $A3.B1 > [1] "General" > > $A3.B3 > [1] "General" "Life" > > $A4.B4 > [1] "General" "Life" > > $A5.B5 > [1] "General" "Life" > > > Then here are those with single policies. > > > out[ which( sapply(out, length) == 1 ) ] > $A1.B1 > [1] "General" > > $A3.B1 > [1] "General" > > > > > David Winsemius wrote: > >> On Oct 28, 2009, at 9:30 PM, Steven Kang wrote: >> >> Dear R users, >>> >>> >>> Basically, from the following arbitrary data set: >>> >>> a <- >>> data >>> .frame >>> (id >>> = >>> c >>> (c >>> ("A1 >>> ","A2 >>> ","A3 >>> ","A4 >>> ","A5 >>> "),c >>> ("A3 >>> ","A2 >>> ","A3 >>> ","A4","A5")),loc=c("B1","B2","B3","B4","B5"),clm=c(rep(("General"), >>> 6),rep("Life",4))) >>> >>> a >>>> >>> id loc clm >>> 1 A1 B1 General >>> 2 A2 B2 General >>> 3 A3 B3 General >>> 4 A4 B4 General >>> 5 A5 B5 General >>> 6 A3 B1 General >>> 7 A2 B2 Life >>> 8 A3 B3 Life >>> 9 A4 B4 Life >>> 10 A5 B5 Life >>> >>> I desire removing records (highlighted records above) with identical >>> values >>> in each fields ("id" & "loc") but with different value of "clm" (i.e >>> according to category) >>> >> >> Take a look at this merge operation on separate rows of "a". >> >> > merge( a[a$clm=="Life", ], a[a$clm=="General", ] , by=c("id", "loc"), >> all=T) >> id loc clm.x clm.y >> 1 A1 B1 <NA> General >> 2 A2 B2 Life General >> 3 A3 B1 <NA> General >> 4 A3 B3 Life General >> 5 A4 B4 Life General >> 6 A5 B5 Life General >> >> Assignment of that object and selection with is.na should complete the >> process. >> >> > a2m <- merge( a[a$clm=="Life", ], a[a$clm=="General", ] , by=c("id", >> "loc"), all=T) >> >> > a2m[ is.na(a2m$clm.x) | is.na(a2m$clm.y), ] >> id loc clm.x clm.y >> 1 A1 B1 <NA> General >> 3 A3 B1 <NA> General >> >> Alternate methods might include paste-ing id to loc and removing >> duplicates. >> >> >> i.e >>> >>>> categ <- table(a$id,a$clm) >>>> categ >>>> >>> General Life >>> A1 1 0 >>> A2 1 1 >>> A3 2 1 >>> A4 1 1 >>> A5 1 1 >>> >>> The desired output is >>> >>> id loc clm >>> 1 A1 B1 General >>> 6 A3 B1 General >>> >>> Because the data set I am working on is quite big (~ 800,000 x 20) >>> with majority of the fields values being long strings, looping turned >>> out to >>> be very inefficient in comapring individual rows.. >>> >>> Are there any alternative efficient methods in implementing this >>> problem? >>> Steven >>> >> > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.