try this: > x <- read.table(textConnection("name nicknames value + 1 A A1 4 + 2 B B1 5 + 3 C C1 9 + 4 B B2 2 + 5 C C2 7 + 6 C C3 6 + 7 C C4 3 + 8 B B3 6 + 9 C C5 7"), header=TRUE) > closeAllConnections() > result <- do.call(rbind, lapply(split(x, x$name), function(.name){ + data.frame(name=.name$name[1], nicknames=paste(.name$nicknames, collapse=','), + mean=mean(.name$value)) + })) > > result name nicknames mean A A A1 4.000000 B B B1,B2,B3 4.333333 C C C1,C2,C3,C4,C5 6.400000 >
On Tue, Feb 9, 2010 at 11:24 AM, Alex Levitchi <alex.levit...@cbm.fvg.it> wrote: > Hello > I am recently began to work with R, so I am not so experienced. > But anyway I cannot find a clear way to process my dataframe which is a > bigger one. > It shows similar to this > >> name=c("A","B","C","B","C","C","C","B","C") >> nicknames=c("A1","B1","C1","B2","C2","C3","C4","B3","C5") >> value=c(4,5,9,2,7,6,3,6,7) >> table=data.frame(cbind(name,nickname,value)) >> table=data.frame(cbind(name,nicknames,value)) >> table > name nicknames value > 1 A A1 4 > 2 B B1 5 > 3 C C1 9 > 4 B B2 2 > 5 C C2 7 > 6 C C3 6 > 7 C C4 3 > 8 B B3 6 > 9 C C5 7 > > So I have to rearrange it in the next way: > - the first column should contain just unduplicated data, I did this, it is > OK and it will look like > 1 A > 2 B > 3 C > > - the second column should contain different 'nicknames' which correspond to > the single A, B or C > name nickname value > 1 A A1 > 2 B B1,B2,B3 > 3 C C1,C2,C3,C4,C5 > > -the third one should contain the mean value of the numbers which correspond > to the same A, B or C > 1 A A1 mean(4) > 2 B B1,B2,B3 mean(5,2,6) > 3 C C1,C2,C3,C4,C5 mean(9,7,6,3,7) > > I did this using a loop 'for'. > to be clear I created tree dataframes which correspond to each of columns, > and finally will combine them > >> ulist=which(!duplicated(table$name)) # I extract the list of positions in >> which I don't have duplications >> name1=data.frame(table$name[ulist]) # I extract the list of unique names >> nicknames1=data.frame(row.names(1:length(ulist))) # I create a dataframe of >> dimension equal to unique list length >> value1=data.frame(row.names(1:length(ulist))) # I create a dataframe of >> dimension equal to unique list length > >> for(i in 1:length(ulist)) { > position=which(as.character(name1[i,1])==table$name) > nicknames1[i,1]=toString(table$nicknames[position]) > value1[i,1]=mean(as.numeric(table$value[position])) > } >> fin=cbind(name1,nicknames1,value1) >> colnames(fin)=c("NAME","NICKNAME","VALUE") >> fin > NAME NICKNAME VALUE > 1 A A1 3.000000 > 2 B B1, B2, B3 3.333333 > 3 C C1, C2, C3, C4, C5 5.200000 > > it works successfully. But in general I work with dataframes of high > dimensions (tens thousands or more rows). > So my loop works too slow (i.e., a dataframe of 20000 rows and 3 columns is > processed in about 10 minutes). > I intend to integrate it into a function, so it is obvious that time will be > even longer. > > If someone can advise me any possibility to modify which I have done or to > the way I can do it, please give me a message. > > King regards to all guys who develop and maintain R sources for such dummies > as me > Alex Levitchi > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.