Dear someone, Jorge's solution is excellent, assuming it is what you had in mind. Please note that the help page for unique() has duplicated() listed in its "See Also" section. Thus, when you studied ?unique(), it would have made sense to read about duplicated() as well. Or perhaps you did look into it, but have not yet seen how to use logical indexes, and as a result, you did not think duplicated() was relevant for what you wanted.
I gather that unique() does not work for you because you want the rest of the information, not just the levels of a particular column to be listed? Keep in mind that you got responses that did not make sense because it was difficult to make sense out of what you were asking. This is a common theme on the list, not just you (I get called on it from time to time, too), and is a normal part of learning the art of asking for R-help. Part of the solution for not only the "R noob" paradox, but also the cross-cultural/language ones, is to include clear examples showing what is needed, in addition to the narrative. For example: ----------------------------------------- # Data to test >(df<-data.frame(ID=c("userA", "userB", "userA", "userC"), > OS=c("Win","OSX","Win", "Win64"), > time=c("12:22","23:22","04:44","12:28"))) # Output of df ID OS time 1 userA Win 12:22 2 userB OSX 23:22 3 userA Win 04:44 4 userC Win64 12:28 # Desired outcome of manipulation (ID as unique; unique(df) and unique(df$ID) do NOT work for this) 1 userA Win 12:22 2 userB OSX 23:22 4 userC Win64 12:28 -------------------------------------------- would have helped readers orient to your question better. If you tried other things, that might be helpful for readers to provide more helpful replies. # Cannot use the results of unique() as an index either > df[unique(df$ID),] ID OS time 1 userA Win 12:22 2 userB OSX 23:22 3 userA Win 04:44 # <- not unique, expected "userC" # duplicated() on the entire data.frame() shows no duplicates [1] FALSE FALSE FALSE FALSE >df[!duplicated(df),] # same as unique(df) The above would have effectively oriented readers to the specific problem far better than approximated meaning used in narrative. It also tends to have the consequence of organizing how some readers may respond, meaning your effort in showing clear examples/details/working code may make a longer reply feel more equitable for someone responding. >duplicated(df$ID) # makes sense, so Jorge suggested using this as an index [1] FALSE FALSE TRUE FALSE # using the raw results, only the duplicated row is returned, hence why Jorge inverted the values >df[duplicated(df$ID),] ID OS time 3 userA Win 04:44 >df[!duplicated(df$ID),] # selects just the not-duplicated rows. This is what you needed, right? ID OS time 1 userA Win 12:22 2 userB OSX 23:22 4 userC Win64 12:28 # note that the original row labels are intact rownames(df[!duplicated(df$ID),]) # not sure why I'd do this, though it occurs that I could, given the above By the way, if you keep your indexes, then all you have to store in the R environment is the original dataset, and your indexes. It helps me stay organized. Welcome, someone, aka "R noob" (as you put it). Keep after it, you won't be a "noob" for long. Sincerely, KeithC. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.