I have two dataframes. One has chromosome arm information, and the other has SNP position information. I am trying to assign each SNP an arm identity. I'd like to create this new column based on comparing it to the reference file.
*1) Mapfile (has millions of rows)* Name Chr Position S1 1 3000 S2 1 6000 S3 1 1000 *2) Chr.Arms file (has 39 rows)* Chr Arm Start End 1 p 0 5000 1 q 5001 10000 *R Script that works, but slow:* Arms <- c() for (line in 1:nrow(Mapfile)){ Arms[line] <- Chr.Arms$Arm[ Mapfile$Chr[line] == Chr.Arms$Chr & Mapfile$Position[line] > Chr.Arms$Start & Mapfile$Position[line] < Chr.Arms$End]} } Mapfile$Arm <- Arms *Output Table:* Name Chr Position Arm S1 1 3000 p S2 1 6000 q S3 1 1000 p In words: I want each line to look up the location ( 1) find the right Chr, 2) find the line where the START < POSITION < END), then get the ARM information and place it in a new column. This R script works, but surely there is a more time/processing efficient way to do it. Thanks in advance for any help, Gaius [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.