Easiest way to do it is to try it out and time it. Here is a case where I generated two sets of data with 120,000 characters each (just random numbers converted to character strings) and then asked for the intersection of them. Came up with 3 matched in about 0.2 seconds. That would seem fastest enough, unless you plan to do this operation tens of thousands of times:
> x <- as.character(runif(120000)) > y <- as.character(runif(120000)) > system.time(z <- intersect(x,y)) user system elapsed 0.22 0.00 0.22 > str(z) chr [1:3] "0.289942682255059" "0.75132836541161" "0.638638160191476" > Here is the timing if you get 50000 matches and it is about the same: > x <- as.character(round(runif(120000),5)) > y <- as.character(round(runif(120000),5)) > system.time(z <- intersect(x,y)) user system elapsed 0.2 0.0 0.2 > str(z) chr [1:48908] "0.08385" "0.62639" "0.47603" "0.18578" "0.89447" "0.58435" "0.15297" ... > On Tue, Mar 25, 2008 at 10:28 PM, Suhaila Zainudin <[EMAIL PROTECTED]> wrote: > Hi, > > Thanks for the feedback. I have tried it on the small size sample and ref > and it works. Now I want to use a larger dataset for myref (the reference > file) . The reference file contains 112189 rows. Can I use the same approach > that works for the small example? Or are there other alternatives when > dealing with data of that magnitude? > > > -- > Suhaila Zainudin > PhD Candidate > Universiti Teknologi Malaysia -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.