Is this what you want: > ds1 <- read.table(text = "1 A + 2 B + 3 X + 4 AA + 5 A + 6 D + 7 XA + 8 C", as.is = TRUE) > > ds2 <- read.table(text = "1 A + 2 X + 3 A", as.is = TRUE) > > # find matches > ds3 <- ds1[!(ds1$V2 %in% ds2$V2), ] > ds3 V1 V2 2 2 B 4 4 AA 6 6 D 7 7 XA 8 8 C
On Mon, Feb 27, 2012 at 6:17 AM, Jonas Fransson <j...@iva.dk> wrote: > Dear all, > > I want to delete the exact matches in a large dataset based on a smaller > dataset. In other words I want to subtract the smaller dataset from the > larger one. The smaller dataset is a part of the larger one. The datasets > contains hundred of thousands of lines (1 column) and the content on each > line differ in length. The data is extracted paths from web logs. > > On an abstract level I want to subtract dataset2 from dataset1 to get > dataset3: > > dataset1: > 1 A > 2 B > 3 X > 4 AA > 5 A > 6 D > 7 XA > 8 C > > dataset2: > 1 A > 2 X > 3 A > > dataset3: > 1 B > 2 AA > 3 D > 4 XA > 5 C > > The final order in dataset3 is not important. > > Thanks, > > Jonas Fransson > Ph.D.stud. > > IVA / Det Informationsvidenskabelige Akademi > Royal School of Library and Information Science > Birketinget 6 > DK-2300 Copenhagen S > T +45 32 58 60 66 > D +45 32 34 15 10 > www.iva.dk/jf > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.