Use aggregate() for aggregation and use indexing or subset() for selection. Alternately try the sqldf package: http://sqldf.googlecode.com which allows one to perform SQL operations on data frames.
On Fri, Jun 6, 2008 at 6:12 AM, <[EMAIL PROTECTED]> wrote: > Dear R experts, > > I am currently facing a tricky problem which I have read a lot about in > the various R mailing lists without finding exactly what I need. > I have a big data frame DF (about 2,000,000 rows) with 7 columns being > variables and 1 being a measure (using reshape package nomeclature). > There are no "duplicates" in it. > Fot each of the variables I have some "rules" to apply, being COD_IN the > value of the variable in the DF, COD_OUT the one to be transformed to; > once obtained the "new codes" in the DF I have to aggregate the "new DF" > (for example summing the measure). > Usually the total transformation (merge+aggregate) really decreases the > number of lines in the data frame, but sometimes it can grows depending > on the rule. Just to give an idea, the first "rule" in v1 maps 820 > different values into 7 ones. > Using SQL and a database this can be done in a very straightforward way > (for example on the variable v1): > > Select COD_OUT, v2, v3, v4, v5, v6, v7, sum(measure) > >From DF, RULE_v1 > Where v1=COD_IN > Group by v2, v3,v4, v5, v6, v7 > > So the first choice would be using a database; the second one would be > splitting the data frame and then joining the results. > Is there any other possibility to merge+aggregate caused by the merge ? > > Thank you in advance > > Angelo Linardi > > > > ** Le e-mail provenienti dalla Banca d'Italia sono trasmesse in buona fede e > non > comportano alcun vincolo ne' creano obblighi per la Banca stessa, salvo che > cio' non > sia espressamente previsto da un accordo scritto. > Questa e-mail e' confidenziale. Qualora l'avesse ricevuta per errore, La > preghiamo di > comunicarne via e-mail la ricezione al mittente e di distruggerne il > contenuto. La > informiamo inoltre che l'utilizzo non autorizzato del messaggio o dei suoi > allegati > potrebbe costituire reato. Grazie per la collaborazione. > -- E-mails from the Bank of Italy are sent in good faith but they are neither > binding on > the Bank nor to be understood as creating any obligation on its part except > where > provided for in a written agreement. This e-mail is confidential. If you have > received it > by mistake, please inform the sender by reply e-mail and delete it from your > system. > Please also note that the unauthorized disclosure or use of the message or any > attachments could be an offence. Thank you for your cooperation. ** > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.