Hello, I have a data frame (68,000 rows) of scores (V4) for a series of [genomic] coordinates ranges (V2 to V3).
I also have a data frame (1.2 million rows) of single [genomic] coordinates. For each genomic coordinate (in coord), I would like to determine the average of all scores whose genomic ranges (in scores) encompass the coordinate (in coord). To accomplish this, I tried: The function works, but is extremely slow. It would take about 4 days for this to finish for a single data set, and I have 64 data sets. Why does the rate at which coordinate averages are calculated increase when coord is smaller, but not when scores is smaller? How can I accomplish the same thing more efficiently? Thanks, Dan -- View this message in context: http://r.789695.n4.nabble.com/vectorization-with-subset-tp4635156.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.