Hi Gaius, Could you use data.table and loop over the small Chr.arms?
library(data.table) mapfile <- data.table(Name = c("S1", "S2", "S3"), Chr = 1, Position = c(3000, 6000, 1000), key = "Chr") Chr.Arms <- data.table(Chr = 1, Arm = c("p", "q"), Start = c(0, 5001), End = c(5000, 10000), key = "Chr") Arms <- data.table() for(i in 1:nrow(Chr.Arms)){ cur.row <- Chr.Arms[i, ] Arm <- mapfile[ Position >= cur.row$Start & Position <= cur.row$End] Arm <- Arm[ , Arm:=cur.row$Arm][] Arms <- rbind(Arms, Arm) } # Or use plyr to loop over each possible arm library(plyr) Arms <- ddply(Chr.Arms, .variables = "Arm", function(cur.row, mapfile){ mapfile <- mapfile[ Position >= cur.row$Start & Position <= cur.row$End] mapfile <- mapfile[ , Arm:=cur.row$Arm][] return(mapfile) }, mapfile = mapfile) I have just started to use the data.table and I have the feeling the code above can be greatly improved - maybe the loop can be dropped entirely? Hope this helps Ulrik On Sat, 30 Jan 2016 at 03:29 Gaius Augustus <gaiusjaugus...@gmail.com> wrote: > I have two dataframes. One has chromosome arm information, and the other > has SNP position information. I am trying to assign each SNP an arm > identity. I'd like to create this new column based on comparing it to the > reference file. > > *1) Mapfile (has millions of rows)* > > Name Chr Position > S1 1 3000 > S2 1 6000 > S3 1 1000 > > *2) Chr.Arms file (has 39 rows)* > > Chr Arm Start End > 1 p 0 5000 > 1 q 5001 10000 > > > *R Script that works, but slow:* > Arms <- c() > for (line in 1:nrow(Mapfile)){ > Arms[line] <- Chr.Arms$Arm[ Mapfile$Chr[line] == Chr.Arms$Chr & > Mapfile$Position[line] > Chr.Arms$Start & Mapfile$Position[line] < > Chr.Arms$End]} > } > Mapfile$Arm <- Arms > > > *Output Table:* > > Name Chr Position Arm > S1 1 3000 p > S2 1 6000 q > S3 1 1000 p > > > In words: I want each line to look up the location ( 1) find the right Chr, > 2) find the line where the START < POSITION < END), then get the ARM > information and place it in a new column. > > This R script works, but surely there is a more time/processing efficient > way to do it. > > Thanks in advance for any help, > Gaius > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.