Hi,

On Thu, May 13, 2010 at 10:49 AM, Amit Patel <amitrh...@yahoo.co.uk> wrote:
> Hi
>
> I have tried many attempts but cant get the loop right, as I am not a strong 
> programmer. What I am basically trying to do is compare 2 spreadsheets. The 
> problem is that one of them only contain a portion of the overall data 
> (TESTSAMP), where the other has a full datasetFULLSAMP. From the complete set 
> I would like to remove the rows of data which are not in the TESTSAMP. Column 
> 1 contains the sample numbers which can be used to identify samples. Does 
> anyone have any suggestions?
>
> I have tried various things like double loops and so on, but I am sure there 
> is an easier way or function to do this.
>
> i tried this method, but Im not sure how to only keep looping until a match 
> is found. I dont understand how repeat loops work in R.
>
> for (i in 1:length(FULLSAMP[,1])) {
>
> if (FULLSAMP[i,1] != TESTSAMP[i,1]) {
> FULLSAMP <- FULLSAMP[-i,]
> }

You want to not use for loops as much as possible.

Imagine your samples are identified as letters, so FULLSAMP[,1] will
be letters A..Z, and TESTSAMP[,1] will be some random 15 letters. Now
the job is to match the rows in TESTAMP to the rows in FULLSAMP, and
remove any "extra" rows in FULLSAMP that don' appear in testamp.

## Making some data
R> fullsamp <- data.frame(id=LETTERS, something=sample(1:100,
length(letters)), stringsAsFactors=FALSE)
R> testsamp <- data.frame(id=sample(LETTERS, 15),
something=sample(1:100, 15), stringsAsFactors=FALSE)

## Let's find where the "testamp" rows appear in "fullsamp"
R> xref <- match(testsamp[,1], fullsamp[,1])

## Now reduce fullsamp to have only the data corresponding to testsamp
## (and in the same order
R> fullsamp.sub <- fullsamp[xref,]

Notice that fullsamp.sub now has only rows with IDs appearing in
testsamp and they are also in the same order as testsamp.

Now go ahead and read the help you'll find in ?match

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to