Re: [R] Efficient way to create new column based on comparison with another dataframe

2016-02-01 Thread Hervé Pagès
Hi Gaius, On 01/29/2016 10:52 AM, Gaius Augustus wrote: I have two dataframes. One has chromosome arm information, and the other has SNP position information. I am trying to assign each SNP an arm identity. I'd like to create this new column based on comparing it to the reference file. *1) Map

Re: [R] Efficient way to create new column based on comparison with another dataframe

2016-01-31 Thread Gaius Augustus
Thanks Denes, I should have thought of foverlaps as an option. I wonder how fast it is compared to my solution! My particular solution does not need data.table in order to work. It just loops through the ChrArms (Chromosome Arms, which always has 39 rows) and assigns the proper arm to all rows w

Re: [R] Efficient way to create new column based on comparison with another dataframe

2016-01-31 Thread Dénes Tóth
Hi, I have not followed this thread from the beginning, but have you tried the foverlaps() function from the data.table package? Something along the lines of: --- # create the tables (use as.data.table() or setDT() if you # start with a data.frame) mapfile <- data.table(Name = c("S1", "S2", "

Re: [R] Efficient way to create new column based on comparison with another dataframe

2016-01-30 Thread Gaius Augustus
I'll look into the Intervals idea. The data.table code posted might not work (because I don't believe it would put the rows in the correct order if the chromosomes are interspersed), however, it did make me think about possibly assigning based on values... *SOLUTION* mapfile <- data.frame(Name =

Re: [R] Efficient way to create new column based on comparison with another dataframe

2016-01-30 Thread Gaius Augustus
I'll look into the Intervals idea. The data.table code posted might not work (because I don't believe it would put the rows in the correct order if the chromosomes are interspersed), however, it did make me think about possibly assigning based on values... Something like: mapfile <- data.table(Na

Re: [R] Efficient way to create new column based on comparison with another dataframe

2016-01-29 Thread Ulrik Stervbo
Hi Gaius, Could you use data.table and loop over the small Chr.arms? library(data.table) mapfile <- data.table(Name = c("S1", "S2", "S3"), Chr = 1, Position = c(3000, 6000, 1000), key = "Chr") Chr.Arms <- data.table(Chr = 1, Arm = c("p", "q"), Start = c(0, 5001), End = c(5000, 1), key = "Chr"

[R] Efficient way to create new column based on comparison with another dataframe

2016-01-29 Thread Gaius Augustus
I have two dataframes. One has chromosome arm information, and the other has SNP position information. I am trying to assign each SNP an arm identity. I'd like to create this new column based on comparing it to the reference file. *1) Mapfile (has millions of rows)* NameChr Position S1