Hi, On Fri, May 13, 2011 at 7:06 PM, wong, honkit (Stephen) <hon...@stanford.edu> wrote: > Dear All, > I am new to R. I have a 2 column data frame with more than ten thousand > rows. Something like below. I want to add up all duplicated items, e.g. the > three "aa" add up together to get a single value gene=a, value=74. How can I > do that?? Thanks for help ! > gene value > aa 20 > bb 10 > cc 9 > aa 30 > aa 24 > dd 100 > ee 55
In addition to Dennis' suggestion to use the aggregate function, you could look at the plyr or data.table packages. For instance. As Dennis suggested, lets assume your data is in a data.frame object named `d`. R> d <- data.frame(gene=c('aa', 'bb', 'cc', 'aa', 'aa', 'dd', 'ee'), value=c(20, 10, 9, 30, 24, 100, 55)) Using data.table: R> library(data.table) R> dd <- data.table(d, key='gene') # note this will reorder the data in dd R> dd[, list(total=sum(value)), by=gene] gene total [1,] aa 74 [2,] bb 10 [3,] cc 9 [4,] dd 100 [5,] ee 55 Or using plyr R> library(plyr) R> ddply(idata.frame(d), .(gene), summarize, total=sum(value)) gene total 1 aa 74 2 bb 10 3 cc 9 4 dd 100 5 ee 55 Note that you don't have to use idata.frame(d) -- you can just do: R> ddply(d, .(gene), summarize, total=sum(value)) but using idata.frame(d) helps to calculate the result faster, especially noticeable for larger data.frame(s). Using data.table will likely be faster still (again, more noticeable with larger data.frames), but (for one thing) be aware that the order of the rows in dd will be different than the ones in d: they will be ordered by the key column(s). Also working with data.table objects is somehow similar to "normal" data.frame objects, but they do differ in important ways (eg. how to index columns using the [] syntax, for starters). You should go through the plyr tutorial(s) (at: http://had.co.nz/plyr/) , or the vignette(s) that comes w/ data.table for more info/help/use-cases if you plan to go that route. Hope that helps, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.