Re: [R] vectorization with subset?

David Winsemius Mon, 02 Jul 2012 11:27:22 -0700


On Jul 2, 2012, at 12:15 PM, dlv04c wrote:

Hello,
I have a data frame (68,000 rows) of scores (V4) for a series of[genomic]
coordinates ranges (V2 to V3).
I also have a data frame (1.2 million rows) of single [genomic]coordinates.
For each genomic coordinate (in coord), I would like to determine the
average of all scores whose genomic ranges (in scores) encompass the
coordinate (in coord). To accomplish this, I tried:



The function works, but is extremely slow.
It would take about 4 days for this to finish for a single data set,and I
have 64 data sets.
Why does the rate at which coordinate averages are calculatedincrease when
coord is smaller, but not when scores is smaller?

How can I accomplish the same thing more efficiently?

You probably need to start by reading the vignettes for the IRangespackage. It's difficult to be sure since you did not show the code forwhat you were doing currently.


--

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] vectorization with subset?

Reply via email to