Re: [R] summarizing a complex dataframe

Steve Lianoglou Wed, 11 Jan 2012 14:12:30 -0800

Hi,

On Wed, Jan 11, 2012 at 3:55 PM, Christopher G Oakley
<coak...@bio.fsu.edu> wrote:
> I need some help summarizing complex data frames (small example below):
>
>    m1_1 m2_1 m3_1 m1_2 m2_2 m3_2
> i1    1    1    1    2    2    2
> i1    2    1    1    2    2    2
> i2    2    2    1    2    2    2
>
>
> For an arbitrary number of columns (say m1 …. m199) where the column names 
> have variable patterns,
>
> and such that each set of columns is repeated (with potentially unique data) 
> an arbitrary number of times (say _1 … _1000),


[snip]

Perhaps your job would be easier if you change the layout of your data
frame, for instance you can have "experiment.name" and "replicate"
columns, so your "clean" data.frame would look like:

experiment.name   replicate   region   count
m1                       1              i1          1
m2                       1              i1          1
m3                       1              i1           1
...

You can use the reshape (or reshape2) package to help you whip your
old table into a new one using a formula interface, if you like.

You can then use your favorite split-apply-combine[1] method (via
plyr, data.table, sqldf, or even base::tapply) to calculate summary
statistics over the values of interest in each group/subgroup,
whatever.

HTH,
-steve

[1] The Split-Apply-Combine Strategy for Data Analysis:
http://www.jstatsoft.org/v40/i01

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] summarizing a complex dataframe

Reply via email to