yep, it definitely sounds like a work for perl, but I don't know perl (unfortunately). I'm still stuck with this so I'm giving more details in case it helps:
I have file A with 382 columns and 300000 rows. There are rows where only the entry in first column is duplicated in other rows. In these cases, I need to delete the entire row. I also have a file B (one column and around 280000 rows) with a list of the entries that are repeated. So I was trying to look for the ones that match and get rid of the entire row. Thank you! Laura 2009/2/6 Wacek Kusnierczyk <waclaw.marcin.kusnierc...@idi.ntnu.no>: > Laura Rodriguez Murillo wrote: >> Thank you. I think grep would do it, but the list of expressions I >> need to match is too long so they are stored in a file. > > what does 'too long' mean? > >> So the >> question would be how I can tell R to look into that file to look for >> the expressions that I want to match. >> > > i guess you may still successfully use r for this, but to me it sounds > like a perfect job for perl. let me know if you need more help. > > note, in the below, you'd use 'data[,2]' instead of 'd[,2]' (or 'd' > instead of 'data'). sorry for the typo. mark, thanks for pointing this > out -- the more obvious the mistake, the less visible ;) > > vQ > > >> Thank you again for your help >> >> Laura >> >> 2009/2/6 Wacek Kusnierczyk <waclaw.marcin.kusnierc...@idi.ntnu.no>: >> >>> Laura Rodriguez Murillo wrote: >>> >>>> Hi, >>>> >>>> I'm new in the mailing list but I would appreciate if you could help >>>> me with this: >>>> I have a big matrix from where I need to delete specific rows. The >>>> second entry on these rows to delete should match any string within a >>>> list (other file with just one column). >>>> Thank you so much! >>>> >>>> >>>> >>> here's one way to do it, illustrated with dummy data: >>> >>> # dummy character matrix >>> data = matrix(replicate(20, paste(sample(letters, 20), collapse="")), >>> ncol=2) >>> >>> # filter out rows where second column does not match 'a' >>> data[-grep('a', d[,2]),] >>> >>> this will work also if your data is actually a data frame: >>> >>> data = as.data.frame(data) >>> data[-grep('a', d[,2]),] >>> >>> note, due to a known issue with grep, this won't work correctly if there >>> are *no* rows that do *not* match the pattern: >>> >>> data[-grep('1', d[,2]),] >>> # should return all of data, but returns an empty matrix >>> >>> with the upcoming version of r, grep will have an additional argument >>> which will make this problem easy to fix: >>> >>> data[grep('a', d[,2], invert=TRUE),] >>> >>> >>> vQ > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.