Thanks. Yes, I did that on a toy data set and with my real data. It *seems* to have worked. I just work with grep so rarely that I didn't want to miss something.
-----Original Message----- From: Erik Iverson [mailto:er...@ccbr.umn.edu] Sent: Thursday, July 15, 2010 5:36 PM To: Doran, Harold Cc: r-help@r-project.org Subject: Re: [R] Proper use of grep Doran, Harold wrote: > I just need to confirm something with pattern matching folks. I have > a factor with the following levels in a very large data set: > >> levels(all$Classical.Statistic) > [1] "" "AB;ABD" > "CollapsedSteps" "CR_P" "CR_Prop;CR_P;AB" > [6] "NMK" "NMK;P" "NMK;P;ABD" > "P" "ABD" [11] "CR_P;CollapsedSteps" > "NMK;AB;ABD" "NMK;ABD" "NMK;P;AB" > "NMK;P;AB;ABD" [16] "AB" "CRT;CollapsedSteps" > "NMK;AB" "CR_P;CRT;CollapsedSteps" "CR_Prop;CR_P" > > I need to subset the rows in which the term "CollapsedSteps" appears. > So, it may appear as "CollapsedSteps" or may appear as > "CR_P;CRT;CollapsedSteps" as you can see above. I'm using grep as > follows: > > all[grep('CollapsedSteps', all$Classical.Statistic),] > > to find any row in which the term "'CollapsedSteps" appears. Is this > certain to catch all cases, or is there an intricacy that I may have > missed. Well, just try it for yourself on a data.frame that's small enough to verify 'manually'. For instance, the data.frame that contains each level exactly once sounds like a good candidate. test <- subset(all, !duplicated(Classical.Statistic) and then try your line of code ... And do you really want "" as a level, or should those by NA? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.