On Jul 2, 2012, at 12:15 PM, dlv04c wrote:

Hello,

I have a data frame (68,000 rows) of scores (V4) for a series of [genomic]
coordinates ranges (V2 to V3).



I also have a data frame (1.2 million rows) of single [genomic] coordinates.



For each genomic coordinate (in coord), I would like to determine the
average of all scores whose genomic ranges (in scores) encompass the
coordinate (in coord). To accomplish this, I tried:



The function works, but is extremely slow.

It would take about 4 days for this to finish for a single data set, and I
have 64 data sets.

Why does the rate at which coordinate averages are calculated increase when
coord is smaller, but not when scores is smaller?

How can I accomplish the same thing more efficiently?

You probably need to start by reading the vignettes for the IRanges package. It's difficult to be sure since you did not show the code for what you were doing currently.

--

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to