Re: [R] average duplicated rows?

Rui Barradas Fri, 12 Oct 2012 10:12:46 -0700

Hello,

It could be a job for tapply, but I find it more suited for ?ave.



dat <- read.table(text = "
 gene_id sample_1 sample_2   FL_EARLY  FL_LATE
763938  Eucgr.A00054   fl_S1E   fl_S1L  13.170800  22.2605
763979  Eucgr.A00101   fl_S1E   fl_S1L   0.367960  14.1202
1273243 Eucgr.A00101    fl_S2   fl_S1L   0.356625  14.1202
764169  Eucgr.A00350   fl_S1E   fl_S1L   7.381070  43.9275
1273433 Eucgr.A00350    fl_S2   fl_S1L  10.674500  43.9275
1273669 Eucgr.A00650    fl_S2   fl_S1L  33.669100  50.0169
764480  Eucgr.A00744   fl_S1E   fl_S1L 132.429000 747.2770
1273744 Eucgr.A00744    fl_S2   fl_S1L 142.659000 747.2770
764595  Eucgr.A00890   fl_S1E   fl_S1L   2.937760  14.9647
764683  Eucgr.A00990   fl_S1E   fl_S1L   8.681250  48.5492
1273947 Eucgr.A00990    fl_S2   fl_S1L  10.553300  48.5492
764710  Eucgr.A01020   fl_S1E   fl_S1L   0.000000  57.9273
1273974 Eucgr.A01020    fl_S2   fl_S1L   0.000000  57.9273
764756  Eucgr.A01073   fl_S1E   fl_S1L   8.504710 101.1870
1274020 Eucgr.A01073    fl_S2   fl_S1L   5.400010 101.1870
764773  Eucgr.A01091   fl_S1E   fl_S1L   3.448910  15.7756
764826  Eucgr.A01152   fl_S1E   fl_S1L  69.565700 198.2320
764831  Eucgr.A01158   fl_S1E   fl_S1L   7.265640  30.9565
764845  Eucgr.A01172   fl_S1E   fl_S1L   3.248020  16.9127
764927  Eucgr.A01269   fl_S1E   fl_S1L  18.710200  76.6918
", header = TRUE)

av <- ave(dat$FL_EARLY, dat$gene_id)
dat$FLY_EARLY <- av


Hope this helps,

Rui Barradas
Em 12-10-2012 16:41, Vining, Kelly escreveu:

Dear useRs,

I have a slightly complicated data structure and am stuck trying to extract what I need. I'm pasting an example of this data below. In some cases, 
there are duplicates in the "gene_id" column because there are two different "sample 1" values for a given "sample 2" 
value. Where these duplicates exist, I need to average the corresponding "FL_EARLY" values and retain the "FL_LATE" value and 
replace those two rows with a row containing the "FL_EARLY" average so that I no longer have any "gene_id" duplicates.

Seems like this is a job for some version of the apply function, but searching 
and puzzling over this has not gotten me anywhere. Any help will be much 
appreciated!

Example data:


              gene_id sample_1 sample_2   FL_EARLY  FL_LATE
763938  Eucgr.A00054   fl_S1E   fl_S1L  13.170800  22.2605
763979  Eucgr.A00101   fl_S1E   fl_S1L   0.367960  14.1202
1273243 Eucgr.A00101    fl_S2   fl_S1L   0.356625  14.1202
764169  Eucgr.A00350   fl_S1E   fl_S1L   7.381070  43.9275
1273433 Eucgr.A00350    fl_S2   fl_S1L  10.674500  43.9275
1273669 Eucgr.A00650    fl_S2   fl_S1L  33.669100  50.0169
764480  Eucgr.A00744   fl_S1E   fl_S1L 132.429000 747.2770
1273744 Eucgr.A00744    fl_S2   fl_S1L 142.659000 747.2770
764595  Eucgr.A00890   fl_S1E   fl_S1L   2.937760  14.9647
764683  Eucgr.A00990   fl_S1E   fl_S1L   8.681250  48.5492
1273947 Eucgr.A00990    fl_S2   fl_S1L  10.553300  48.5492
764710  Eucgr.A01020   fl_S1E   fl_S1L   0.000000  57.9273
1273974 Eucgr.A01020    fl_S2   fl_S1L   0.000000  57.9273
764756  Eucgr.A01073   fl_S1E   fl_S1L   8.504710 101.1870
1274020 Eucgr.A01073    fl_S2   fl_S1L   5.400010 101.1870
764773  Eucgr.A01091   fl_S1E   fl_S1L   3.448910  15.7756
764826  Eucgr.A01152   fl_S1E   fl_S1L  69.565700 198.2320
764831  Eucgr.A01158   fl_S1E   fl_S1L   7.265640  30.9565
764845  Eucgr.A01172   fl_S1E   fl_S1L   3.248020  16.9127
764927  Eucgr.A01269   fl_S1E   fl_S1L  18.710200  76.6918



--Kelly V.


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] average duplicated rows?

Reply via email to