Hi-- This is a question with a trivial and obvious answer, I'm sure, but I can't seem to find it in the help files and books that I have handy. I have a dataframe consisting of two columns, "Gene_Name," a list of gene symbols, and "Number," a numeric measure of how frequently a tag representing that gene showed up in a SAGE library. Several of the genes are represented by multiple tags, and therefore are present more than once in the list, e.g.:
1167 Zcchc8 6 1168 Zcwpw1 5 1169 Zdhhc18 6 1170 Zdhhc20 5 1171 Zdhhc3 6 1172 Zdhhc3 5 1173 Zeb2 9 1174 Zeb2 6 What I want is to collapse the list by gene name, such that duplicates are summed up and appear only once in the final version: Zcchc8 6 Zcwpw1 5 Zdhhc18 6 Zdhhc20 5 Zdhhc3 11 Zeb2 15 The only way I can figure out to do this is via rowsum: > rowsum (Number,Gene_Name) gives me exactly what I want, *except* that in the end, I am left with a matrix containing the Number values and with the Gene_Names used as row names (the output therefore looks exactly as printed above) -- what I want is a dataframe equivalent to the starting table, with numbered rows and separate, accessible columns containing the Gene_Name and Number values. I was able to put such a dataframe together manually, by cobbling together the row names of the above list with the values: > genes.unique <- data.frame (rownames (rowsum(Number,Gene_Name)), > rowsum(Number,Gene_Name)) but then I have to manually replace the row names of the dataframe with numbers, to get back to what I wanted in the first place. I hope this makes some sort of sense. Is there an easier way to do this? Thanks in advance! Charlie Murtaugh ===== L. Charles Murtaugh Assistant Professor University of Utah Dept. of Human Genetics 15 N. 2030 E. Rm. 2100 Salt Lake City, UT 84112 tel 801-581-5958 fax 801-581-6463 email [EMAIL PROTECTED] [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.