On Sat, 2011-03-19 at 15:58 +0100, Bodnar Laszlo EB_HU wrote: > Hi, I'll top-post as the original Q is very lengthy:
tabs <-lapply(df[,2:6], function(x, id){ t(table(addNA(x), id, useNA = "ifany")) }, df$id) is one way of doing what you want. More details are here: http://stackoverflow.com/questions/5362702/persuading-tabulate-function-to-count-nas-in-a-data-frame-in-r where you also posted your Q. HTH G > I'd like to ask you a question again. It is basically about data frames, NAs > and tabulate function. > > I have this data frame. I already used this in one of the previous questions > of mine. It intentionally looks this simple, my real 'df' dataframe is much > bigger actually and again, I am not willing to annoy anyone with huge > databases... So, my database: > > id <-c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3) > a <-c(3,1,3,3,1,3,3,3,3,1,3,2,1,2,1,3,3,2,1,1,1,3,1,3,3,3,2,1,1,3) > b <-c(3,2,1,1,1,1,1,1,1,1,1,2,1,3,2,1,1,1,2,1,3,1,2,2,1,3,3,2,3,2) > c <-c(1,3,2,3,2,1,2,3,3,2,2,3,1,2,3,3,3,1,1,2,3,3,1,2,2,3,2,2,3,2) > d <-c(3,3,3,1,3,2,2,1,2,3,2,2,2,1,3,1,2,2,3,2,3,2,3,2,1,1,1,1,1,2) > e <-c(2,3,1,2,1,2,3,3,1,1,2,1,1,3,3,2,1,1,3,3,2,2,3,3,3,2,3,2,1,4) > df <-data.frame(id,a,b,c,d,e) > df > > I have managed to calculate the distributions of the numbers occurring in > columns 'b' to 'e' but considering the fact at the very same time that these > distributions should be 'groupped by' the id numbers in column 'id'. It works > fine, check it -> > > matrix(matrix(unlist(lapply(df[,(-(1))],function(x) > tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,2])))) > [[1]])),ncol=3,nrow=3,byrow=TRUE) > matrix(matrix(unlist(lapply(df[,(-(1))],function(x) > tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,3])))) > [[2]])),ncol=3,nrow=3,byrow=TRUE) > matrix(matrix(unlist(lapply(df[,(-(1))],function(x) > tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,4])))) > [[3]])),ncol=3,nrow=3,byrow=TRUE) > matrix(matrix(unlist(lapply(df[,(-(1))],function(x) > tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,5])))) > [[4]])),ncol=3,nrow=3,byrow=TRUE) > matrix(matrix(unlist(lapply(df[,(-(1))],function(x) > tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,6])))) > [[5]])),ncol=4,nrow=3,byrow=TRUE) > > Now my problem is: what if my data frame contains NA values here and there > and what if I want my in-built tabulate function to collect these NAs as > well? So what if I want it to count how many occurrences I have from these > NAs? > > Here's my modified data frame with the NAs: > id <-c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3) > a <-c(NA,1,3,3,1,3,3,3,3,1,3,2,1,2,1,3,3,2,1,1,1,3,1,3,3,3,2,1,1,3) > b <-c(3,2,1,1,1,1,1,1,1,1,1,2,1,3,2,1,1,1,2,1,3,1,2,2,1,3,3,2,3,2) > c <-c(1,3,2,3,2,1,2,3,3,2,2,3,NA,2,3,3,3,1,1,2,3,3,1,2,2,3,2,2,3,2) > d <-c(3,3,3,1,3,2,2,1,2,3,2,2,2,1,3,1,2,2,3,2,3,2,3,2,1,1,1,1,1,2) > e <-c(2,3,1,2,1,2,3,3,1,1,2,1,1,3,3,2,1,1,3,3,2,2,3,3,3,2,3,NA,1,4) > df <-data.frame(id,a,b,c,d,e) > df > > At first I tried something like this (you see, the only thing I did was that > I tried to apply this "exclude=NULL" thing). > unlist(lapply(df[,(-(1))],function(x) > tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,2],exclude=NULL)))) [[1]]) > > At least my code realizes the fact that I have 4 different levels in column > 'a' (1,2,3,NA) and not only three (1,2,3). Check it here: > nlevels(factor(df[,2],exclude=NULL)) > > But you see in the result that somehow it could not calculate the NAs. It says > 3 0 6 0(!) 4 3 3 0 4 1 5 0 > > Instead of the correct: > 3 0 6 1(!) 4 3 3 0 4 1 5 0 > > Or in case of: > unlist(lapply(df[,(-(1))],function(x) > tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,4],exclude=NULL)))) [[3]]) > > It says > 2 4 4 0 2 3 4 0(!) 1 5 4 0 > > Instead of the correct > 2 4 4 0 2 3 4 1(!) 1 5 4 0 > etc. > > Does someone have any ideas how to "persuade" the function tabulate to count > NAs? Is it possible at all? > Thanks very much and have a pleasant weekend, > Laszlo > > ____________________________________________________________________________________________________ > Ez az e-mail és az összes hozzá tartozó csatolt melléklet titkos és/vagy > jogilag, szakmailag vagy más módon védett információt tartalmazhat. > Amennyiben nem Ön a levél címzettje akkor a levél tartalmának közlése, > reprodukálása, másolása, vagy egyéb más úton történő terjesztése, > felhasználása szigorúan tilos. Amennyiben tévedésből kapta meg ezt az > üzenetet kérjük azonnal értesítse az üzenet küldőjét. Az Erste Bank Hungary > Zrt. (EBH) nem vállal felelősséget az információ teljes és pontos - > címzett(ek)hez történő - eljuttatásáért, valamint semmilyen késésért, > kapcsolat megszakadásból eredő hibáért, vagy az információ felhasználásából > vagy annak megbízhatatlanságából eredő kárért. > > Az üzenetek EBH-n kívüli küldője vagy címzettje tudomásul veszi és > hozzájárul, hogy az üzenetekhez más banki alkalmazott is hozzáférhet az EBH > folytonos munkamenetének biztosítása érdekében. > > > This e-mail and any attached files are confidential and/...{{dropped:19}} > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.