Thanks much! This was very helpful. --Kelly
-----Original Message----- From: arun [mailto:smartpink...@yahoo.com] Sent: Friday, October 12, 2012 7:23 PM To: Vining, Kelly Cc: Rui Barradas; R help Subject: Re: [R] average duplicated rows? HI, My earlier solutions averaged FL_EARLY values for duplicated "gene_ids" so that the resultant dataframe had unique rows. But, if you want to keep the duplicated rows with average values, you can also try this: dat$FL_EARLY<-unlist(lapply(lapply(split(dat,dat$gene_id),`[`,4),function(x) rep(colMeans(x),each=nrow(x))),use.names=F) head(dat) # gene_id sample_1 sample_2 FL_EARLY FL_LATE #763938 Eucgr.A00054 fl_S1E fl_S1L 13.1708000 22.2605 #763979 Eucgr.A00101 fl_S1E fl_S1L 0.3622925 14.1202 #1273243 Eucgr.A00101 fl_S2 fl_S1L 0.3622925 14.1202 #764169 Eucgr.A00350 fl_S1E fl_S1L 9.0277850 43.9275 #1273433 Eucgr.A00350 fl_S2 fl_S1L 9.0277850 43.9275 #1273669 Eucgr.A00650 fl_S2 fl_S1L 33.6691000 50.0169 A.K. ----- Original Message ----- From: Rui Barradas <ruipbarra...@sapo.pt> To: "Vining, Kelly" <kelly.vin...@oregonstate.edu> Cc: "r-help@r-project.org" <r-help@r-project.org> Sent: Friday, October 12, 2012 1:10 PM Subject: Re: [R] average duplicated rows? Hello, It could be a job for tapply, but I find it more suited for ?ave. dat <- read.table(text = " gene_id sample_1 sample_2 FL_EARLY FL_LATE 763938 Eucgr.A00054 fl_S1E fl_S1L 13.170800 22.2605 763979 Eucgr.A00101 fl_S1E fl_S1L 0.367960 14.1202 1273243 Eucgr.A00101 fl_S2 fl_S1L 0.356625 14.1202 764169 Eucgr.A00350 fl_S1E fl_S1L 7.381070 43.9275 1273433 Eucgr.A00350 fl_S2 fl_S1L 10.674500 43.9275 1273669 Eucgr.A00650 fl_S2 fl_S1L 33.669100 50.0169 764480 Eucgr.A00744 fl_S1E fl_S1L 132.429000 747.2770 1273744 Eucgr.A00744 fl_S2 fl_S1L 142.659000 747.2770 764595 Eucgr.A00890 fl_S1E fl_S1L 2.937760 14.9647 764683 Eucgr.A00990 fl_S1E fl_S1L 8.681250 48.5492 1273947 Eucgr.A00990 fl_S2 fl_S1L 10.553300 48.5492 764710 Eucgr.A01020 fl_S1E fl_S1L 0.000000 57.9273 1273974 Eucgr.A01020 fl_S2 fl_S1L 0.000000 57.9273 764756 Eucgr.A01073 fl_S1E fl_S1L 8.504710 101.1870 1274020 Eucgr.A01073 fl_S2 fl_S1L 5.400010 101.1870 764773 Eucgr.A01091 fl_S1E fl_S1L 3.448910 15.7756 764826 Eucgr.A01152 fl_S1E fl_S1L 69.565700 198.2320 764831 Eucgr.A01158 fl_S1E fl_S1L 7.265640 30.9565 764845 Eucgr.A01172 fl_S1E fl_S1L 3.248020 16.9127 764927 Eucgr.A01269 fl_S1E fl_S1L 18.710200 76.6918 ", header = TRUE) av <- ave(dat$FL_EARLY, dat$gene_id) dat$FLY_EARLY <- av Hope this helps, Rui Barradas Em 12-10-2012 16:41, Vining, Kelly escreveu: > Dear useRs, > > I have a slightly complicated data structure and am stuck trying to extract > what I need. I'm pasting an example of this data below. In some cases, there > are duplicates in the "gene_id" column because there are two different > "sample 1" values for a given "sample 2" value. Where these duplicates exist, > I need to average the corresponding "FL_EARLY" values and retain the > "FL_LATE" value and replace those two rows with a row containing the > "FL_EARLY" average so that I no longer have any "gene_id" duplicates. > > Seems like this is a job for some version of the apply function, but > searching and puzzling over this has not gotten me anywhere. Any help will be > much appreciated! > > Example data: > > > gene_id sample_1 sample_2 FL_EARLY FL_LATE > 763938 Eucgr.A00054 fl_S1E fl_S1L 13.170800 22.2605 > 763979 Eucgr.A00101 fl_S1E fl_S1L 0.367960 14.1202 > 1273243 Eucgr.A00101 fl_S2 fl_S1L 0.356625 14.1202 > 764169 Eucgr.A00350 fl_S1E fl_S1L 7.381070 43.9275 > 1273433 Eucgr.A00350 fl_S2 fl_S1L 10.674500 43.9275 > 1273669 Eucgr.A00650 fl_S2 fl_S1L 33.669100 50.0169 > 764480 Eucgr.A00744 fl_S1E fl_S1L 132.429000 747.2770 > 1273744 Eucgr.A00744 fl_S2 fl_S1L 142.659000 747.2770 > 764595 Eucgr.A00890 fl_S1E fl_S1L 2.937760 14.9647 > 764683 Eucgr.A00990 fl_S1E fl_S1L 8.681250 48.5492 > 1273947 Eucgr.A00990 fl_S2 fl_S1L 10.553300 48.5492 > 764710 Eucgr.A01020 fl_S1E fl_S1L 0.000000 57.9273 > 1273974 Eucgr.A01020 fl_S2 fl_S1L 0.000000 57.9273 > 764756 Eucgr.A01073 fl_S1E fl_S1L 8.504710 101.1870 > 1274020 Eucgr.A01073 fl_S2 fl_S1L 5.400010 101.1870 > 764773 Eucgr.A01091 fl_S1E fl_S1L 3.448910 15.7756 > 764826 Eucgr.A01152 fl_S1E fl_S1L 69.565700 198.2320 > 764831 Eucgr.A01158 fl_S1E fl_S1L 7.265640 30.9565 > 764845 Eucgr.A01172 fl_S1E fl_S1L 3.248020 16.9127 > 764927 Eucgr.A01269 fl_S1E fl_S1L 18.710200 76.6918 > > > > --Kelly V. > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.