Hi, On Wed, Jan 11, 2012 at 3:55 PM, Christopher G Oakley <coak...@bio.fsu.edu> wrote: > I need some help summarizing complex data frames (small example below): > > m1_1 m2_1 m3_1 m1_2 m2_2 m3_2 > i1 1 1 1 2 2 2 > i1 2 1 1 2 2 2 > i2 2 2 1 2 2 2 > > > For an arbitrary number of columns (say m1 …. m199) where the column names > have variable patterns, > > and such that each set of columns is repeated (with potentially unique data) > an arbitrary number of times (say _1 … _1000),
[snip] Perhaps your job would be easier if you change the layout of your data frame, for instance you can have "experiment.name" and "replicate" columns, so your "clean" data.frame would look like: experiment.name replicate region count m1 1 i1 1 m2 1 i1 1 m3 1 i1 1 ... You can use the reshape (or reshape2) package to help you whip your old table into a new one using a formula interface, if you like. You can then use your favorite split-apply-combine[1] method (via plyr, data.table, sqldf, or even base::tapply) to calculate summary statistics over the values of interest in each group/subgroup, whatever. HTH, -steve [1] The Split-Apply-Combine Strategy for Data Analysis: http://www.jstatsoft.org/v40/i01 -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.