subject:"\[Rd\] Performing Merge and Duplicated on very large files"

Re: [Rd] Performing Merge and Duplicated on very large files

2007-04-18 Thread Sean Davis

On Tuesday 17 April 2007 23:44, Eitan Rubin wrote: > Hi, > > I am working with very large matrices (>1 million records), and need to > 1. Join the files (can be achieved with Merge) > 2. Find lines that have the same value in some field (after the join) and > randomly sample 1 row. > > I am conce

[Rd] Performing Merge and Duplicated on very large files

2007-04-17 Thread Eitan Rubin

Hi, I am working with very large matrices (>1 million records), and need to 1. Join the files (can be achieved with Merge) 2. Find lines that have the same value in some field (after the join) and randomly sample 1 row. I am concerned with the complexity of merge - how (un)efficient is it? I do