If, 'dat' is the dataset: res <- as.data.frame(do.call(rbind,lapply(split(dat,list(dat$St.Sam,dat$Sp),drop=TRUE),function(x) c(unlist(x[1,1:4]),colSums(x[,5:6]))))) row.names(res) <- 1:nrow(res) res St.Sam Sp Var1 Var2 NT PT 1 ST1.S1 Sp1 12 aa 23 37 2 ST1.S2 Sp1 25 dd 29 66 3 ST2.S1 Sp1 15 aa 30 45 4 ST3.S1 Sp1 25 aa 26 69 5 ST1.S1 Sp2 32 bb 45 26 6 ST1.S2 Sp2 59 ee 89 35 7 ST2.S1 Sp2 45 cc 55 23 8 ST3.S1 Sp2 36 bb 65 48 9 ST1.S1 Sp3 47 cc 89 35 10 ST2.S1 Sp3 27 aa 85 12 11 ST2.S1 Sp4 42 cc 8 3 A.K.
On Friday, February 28, 2014 5:31 PM, ltdm <luis.tito-de-mor...@ird.fr> wrote: Hi list, I have a dataframe df looking like this one: St.Sam Sp Var1 Var2 NT PT ST1.S1 Sp1 12 aa 20 32 ST1.S1 Sp2 32 bb 45 26 ST1.S1 Sp1 12 aa 3 5 ST1.S1 Sp3 47 cc 89 35 ST1.S2 Sp1 25 dd 29 66 ST1.S2 Sp2 59 ee 89 35 ST2.S1 Sp1 15 aa 30 45 ST2.S1 Sp2 45 cc 55 23 ST2.S1 Sp3 27 aa 85 12 ST2.S1 Sp4 42 cc 8 3 ST3.S1 Sp1 25 aa 26 69 ST3.S1 Sp2 36 bb 56 36 ST3.S1 Sp2 36 bb 9 12 St.Sam : indicates the Station and the sample # Sp is a species code Var1 is numeric, Var2 is a factor NT and PT are numeric Please note that lines 2 and 3 are the same species (Sp1) in the same station and sample. Var1 and Var2 have the same values and NT and PT differ. Also the last 2 lines are the same species in the same station and sample. Var1 and Var2 have the same values, and NT and PT differ. What I would like to have is a dataframe like this one: St.Sam Sp Var1 Var2 NT PT ST1.S1 Sp1 12 aa 23 37 ST1.S1 Sp2 32 bb 45 26 ST1.S1 Sp3 47 cc 89 35 ST1.S2 Sp1 25 dd 29 66 ST1.S2 Sp2 59 ee 89 35 ST2.S1 Sp1 15 aa 30 45 ST2.S1 Sp2 45 cc 55 23 ST2.S1 Sp3 27 aa 85 12 ST2.S1 Sp4 42 cc 8 3 ST3.S1 Sp1 25 aa 26 69 ST3.S1 Sp2 36 bb 65 48 i.e. the lines with same pair of St.Sam/Sp have been aggregated and the values NT and PT summed. Var1 and Var2 are untouched (they are allways equal for a given pair St.Sam/Species). I know how to find and extract the lines of interest using "duplicated", but I am unable to automatically sum the NT and PT. I have been struggling with "aggregate" and the like but I am stuck. I searched the web and the archives but found no answer nor similar question. Does anybody have a solution or a link to a page or suggestion to help me ? Thank you very much for your help Tito Here is a quick&dirty way to build the toy dataframe above: St.Sam <- c("ST1.S1","ST1.S1","ST1.S1","ST1.S1","ST1.S2","ST1.S2","ST2.S1","ST2.S1","ST2.S1","ST2.S1","ST3.S1","ST3.S1","ST3.S1") Species <- c("Sp1","Sp2","Sp1","Sp3","Sp1","Sp2","Sp1","Sp2","Sp3","Sp4","Sp1","Sp2","Sp2") Var1 <- c(12,32,12,47,25,59,15,45,27,42,25,36,36) Var2 <- c("aa","bb","aa","cc","dd","ee","aa","cc","aa","cc","aa","bb","bb") NT <- c(20,45,3,89,29,89,30,55,85,8,26,56,9) PT <- c(32,26,5,35,66,35,45,23,12,3,69,36,12) df <- data.frame(St.Sam,Species,Var1,Var2,NT,PT) -- View this message in context: http://r.789695.n4.nabble.com/aggregate-and-sum-on-some-columns-fromduplicate-rows-tp4686040.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.