[R] Query about using timestamps returned by SQL as 'factor' for split

Ted Byers Thu, 08 Jul 2010 11:25:47 -0700

I have a simple query as follows:

"SELECT
m_id,sale_date,YEAR(sale_date),WEEK(sale_date),return_type,DATEDIFF(return_date,sale_date)
AS elapsed_time FROM risk_input"


I can get, and view, all the data that that query returns.  The question is,
sale_date is a timestamp, and I need to call split to group this data by
m_id and the week in which the sale occurred.  Obviously, I would normally
need both YEAR and WEEK so that data from April this year is not combined
with that from last year (the system is non-autonomous).  And then I need to
use lapply to apply fitdistr to each subsample.

Obviously, I can handle all this data in either a data.frame or in a
data.table.

There are two aspects of the question.

1) Is there a function (or package) that will let me group (or regroup) time
series data into the week in which the data apply, properly taking into
account the year that applies, in a single call passing sale_date as the
argument?  If I can, then I can reduce the amount of data I draw from my
MySQL server and the computational load it bears.

2) The example provided for split splits only according to a single variable
(*g <- airquality$Month;l <- split(airquality, g)*).  How would that example
be changed if there were two or more columns in the data.frame that are
needed to define the groups?  I.E. in my example, I'd need to group by m_id,
and the year and week values that can be computed from sale_date.

Thanks

Ted

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Query about using timestamps returned by SQL as 'factor' for split

Reply via email to