Thanks to everybody for trying to help me with this, I think there are
a few workable options here. However, I think the most efficient
option that I've found was to avoid the join/aggregate in R
altogether. I've joined them at the database level to accomplish the
same thing. This may not be a h
Oh, you didn't say the intervals could overlap!
If Bill D's suggestions don't suffice, try the following:
(again assuming all dates are in a form that allow comparison
operations, e.g. via as.POSIX**)
Assume you have g intervals with start dates "starts" and end dates
"ends" and that you have d
You could try pulling some of the repeated subscripting operations,
especially the insertions, out of the loop. E.g.,
values <- observations[,"values"];
date <- observations[,"date"] ;
groups$average <- vapply(seq_len(NROW(groups)), function(i)
mean(values[date >= groups[i, "start"] &
Thanks David, Bert,
>From what I'm reading on ?findInterval, It may not be workable because
of overlapping date ranges. findInterval seems to take a series of
bin breakpoints as its argument. I'm currently exploring data.table
documentation and will keep thinking about this.
Just on David's poin
A strategy:
1. Convert your dates and intervals to numerics that give the days
since a time origin. See as.POSIXlt (or ** ct for details and an
example that does this). Should be fast...
2. Use the findInterval() function to get the interval into which each
date falls. This **is** "vectorized" a
> On Feb 10, 2016, at 12:18 PM, Peter Lomas wrote:
>
> Hello, I have a dataframe with a date range, and another dataframe
> with observations by date. For each date range, I'd like to average
> the values within that range from the other dataframe. I've provided
> code below doing what I would
6 matches
Mail list logo