Re: [R] Agreggating data using external aggregation rules

Gabor Grothendieck Fri, 06 Jun 2008 06:11:42 -0700

Use aggregate() for aggregation and use indexing or subset() for selection.
Alternately try the sqldf package: http://sqldf.googlecode.com
which allows one to perform SQL operations on data frames.


On Fri, Jun 6, 2008 at 6:12 AM,  <[EMAIL PROTECTED]> wrote:
> Dear R experts,
>
> I am currently facing a tricky problem which I have read a lot about in
> the various R mailing lists without finding exactly what I need.
> I have a big data frame DF (about 2,000,000 rows) with 7 columns being
> variables and 1 being a measure (using reshape package nomeclature).
> There are no "duplicates" in it.
> Fot each of the variables I have some "rules" to apply, being COD_IN the
> value of the variable in the DF, COD_OUT the one to be transformed to;
> once obtained the "new codes" in the DF I have to aggregate the "new DF"
> (for example summing the measure).
> Usually the total transformation (merge+aggregate) really decreases the
> number of  lines in the data frame, but sometimes it can grows depending
> on the rule. Just to give an idea, the first "rule" in v1 maps 820
> different values into 7 ones.
> Using SQL and a database this can be done in a very straightforward way
> (for example on the variable v1):
>
> Select COD_OUT, v2, v3, v4, v5, v6, v7, sum(measure)
> >From DF, RULE_v1
> Where v1=COD_IN
> Group by v2, v3,v4, v5, v6, v7
>
> So the first choice would be using a database; the second one would be
> splitting the data frame and then joining the results.
> Is there any other possibility to merge+aggregate caused by the merge ?
>
> Thank you in advance
>
> Angelo Linardi
>
>
>
> ** Le e-mail provenienti dalla Banca d'Italia sono trasmesse in buona fede e 
> non
> comportano alcun vincolo ne' creano obblighi per la Banca stessa, salvo che 
> cio' non
> sia espressamente previsto da un accordo scritto.
> Questa e-mail e' confidenziale. Qualora l'avesse ricevuta per errore, La 
> preghiamo di
> comunicarne via e-mail la ricezione al mittente e di distruggerne il 
> contenuto. La
> informiamo inoltre che l'utilizzo non autorizzato del messaggio o dei suoi 
> allegati
> potrebbe costituire reato. Grazie per la collaborazione.
> -- E-mails from the Bank of Italy are sent in good faith but they are neither 
> binding on
> the Bank nor to be understood as creating any obligation on its part except 
> where
> provided for in a written agreement. This e-mail is confidential. If you have 
> received it
> by mistake, please inform the sender by reply e-mail and delete it from your 
> system.
> Please also note that the unauthorized disclosure or use of the message or any
> attachments could be an offence. Thank you for your cooperation. **
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Agreggating data using external aggregation rules

Reply via email to