Re: [R] Can RMySQL be used for a paramterized query?

Ted Byers Thu, 10 Jun 2010 16:18:57 -0700

Thanks,

Actually, thanks to the info Henrique sent, I have made decent progress.


In actuality, I could have just submitted a SELECT * on the second table,
which would give me everything, just like Henrique's suggestion, and yours,
would give me.  The problem is that that table is HUGE (I don't want to load
ALL that data at once, especially when I'd be analyzing it in chunks defined
by ID and date), and at the same time, the analyses would be done not only
one ID at a time, but on records pertaining to a given day. (e.g., imagine a
dataset containing al sales and refund data, and assuming rates at which
sales end in refunds vary through time, something I know from previous
analyses of similar data, I would need to analyze all refunds for sales that
happened on a given day).

While I was aware I could use RMySQL to get my time series data (I will be
assessing a VAR on a 3D time series once my current task is done), I looked
at TSMySQL because, being relatively inexperienced with R, I need to be able
to do a variety of autoregressive analyses.  Someone has suggested I also
look at state space modelling, but being a mathematical ecologst by
training, I am struggling with that along with Kalman Filtering.  But that
is another post ...

Thanks again

Ted

On Thu, Jun 10, 2010 at 5:51 PM, Paul Gilbert <
pgilb...@bank-banque-canada.ca> wrote:

> Ted
>
> I'm not sure I fully understand the question, but you may want to consider
> creating a temporary table with a join, which you can do with a query from
> your R session, and then query that table to bring the data into R. Roughly,
> the logic is to leave the data in the db if you are not doing any fancy
> calculations. You might also find "order by" is useful. (This is what I use
> in TSdbi to make sure data comes back in the right order as a time series.)
> It may even be possible to get everything you want back in one step using
> this and "group by", rather than looping, but depending on the analysis you
> want to do in R, that may not be the most convenient way.
>
> BTW, I think you realize you do not have to use the TSMySQL commands to
> access the TSMySQL database. They are usually convenient, but you can query
> the tables directly with RMySQL functions.
>
> Paul
>
> >-----Original Message-----
> >From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> >On Behalf Of Henrique Dallazuanna
> >Sent: June 10, 2010 8:47 AM
> >To: Ted Byers
> >Cc: R-help Forum
> >Subject: Re: [R] Can RMySQL be used for a paramterized query?
> >
> >I think you can do this:
> >
> >ids <- dbGetQuery(conn, "SELECT id FROM my_table") other_table <-
> >dbGetQuery(conn, sprintf("SELECT * FROM my_other_table WHERE t1_id in
> >(%s)", paste(ids, collapse = ",")))
> >
> >On Wed, Jun 9, 2010 at 11:24 PM, Ted Byers <r.ted.by...@gmail.com>
> >wrote:
> >
> >> I have not found anything about this except the following from the DBI
> >> documentation :
> >>
> >> Bind variables: the interface is heavily biased towards queries, as
> >> opposed
> >> > to general
> >> > purpose database development. In particular we made no attempt to
> >> > define "bind variables"; this is a mechanism by which the contents
> >> > of R/S objects are implicitly moved to the database during SQL
> >> > execution. For instance, the following embedded SQL statement
> >> > /* SQL */
> >> > SELECT * from emp_table where emp_id = :sampleEmployee would take
> >> > the vector sampleEmployee and iterate over each of its
> >> elements
> >> > to get the result. Perhaps the DBI could at some point in the future
> >> > implement this feature.
> >> >
> >>
> >> I can connect, and execute a SQL query such as "SELECT id FROM
> >> my_table", and display a frame with all the IDs from my_table.  But I
> >> need also to do something like "SELECT * FROM my_other_table WHERE
> >> t1_id = x"  where 'x' is one of the IDs returned by the first select
> >> statement.  Actually, I have to do this in two contexts, one where the
> >> data are not ordered by time and one where it is (and thus where I'd
> >> have to use TSMySQL to execute something like "SELECT
> >record_datetime,value FROM my_ts_table WHERE t2_id = x").
> >>
> >> I'd like to embed this in a loop where I iterate over the IDs returned
> >> by the first select, get the appropriate data from the second for each
> >> ID, analyze that data and store results in another table in the DB,
> >> and then proceed to the next ID in the list.  I suppose an alternative
> >> would be to get all the data at once, but the resulting resultset
> >> would be huge, and I don't (yet) know how to take a subset of the data
> >> in a frame based on a given value in one ot the fields and analyze
> >> that.  Can you point me to an example of how this is done, or do I
> >> have to use a mix of perl (to get the
> >> data) and R (to do the analysis)?
> >>
> >> Any insights on how to proceed would be appreciated.  Thanks.
> >>
> >> Ted
> >>
> >>        [[alternative HTML version deleted]]
> >>
> >>
> >> ______________________________________________
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >
> >
> >--
> >Henrique Dallazuanna
> >Curitiba-Paraná-Brasil
> >25° 25' 40" S 49° 16' 22" O
> >
> >       [[alternative HTML version deleted]]
>
>
> ====================================================================================
>
> La version française suit le texte anglais.
>
>
> ------------------------------------------------------------------------------------
>
> This email may contain privileged and/or confidential information, and the
> Bank of
> Canada does not waive any related rights. Any distribution, use, or copying
> of this
> email or the information it contains by other than the intended recipient
> is
> unauthorized. If you received this email in error please delete it
> immediately from
> your system and notify the sender promptly by email that you have done so.
>
>
> ------------------------------------------------------------------------------------
>
> Le présent courriel peut contenir de l'information privilégiée ou
> confidentielle.
> La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute
> diffusion,
> utilisation ou copie de ce courriel ou des renseignements qu'il contient
> par une
> personne autre que le ou les destinataires désignés est interdite. Si vous
> recevez
> ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans
> délai à
> l'expéditeur un message électronique pour l'aviser que vous avez éliminé de
> votre
> ordinateur toute copie du courriel reçu.
>
>


-- 
R.E.(Ted) Byers, Ph.D.,Ed.D.
t...@merchantservicecorp.com
CTO
Merchant Services Corp.
350 Harry Walker Parkway North, Suite 8
Newmarket, Ontario
L3Y 8L3

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Can RMySQL be used for a paramterized query?

Reply via email to