Hi > > Yes thank you Gu… > I am just trying to do this as a rough step and will try other > imputation methods which are more appropriate later. > I am just learning R, and was trying to do the for loop and > f-statement by hand but something is going wrong… > > This is what I have until now: > > *****fake array: > age<- c(5,8,10,12,NA) > a<- factor(c("aa", "bb", NA, "cc", "cc")) > b<- c("banana", "apple", "pear", "grape", NA) > df_test <- data.frame(age=age, a=a, b=b) > df_test$b<- as.character(df_test$b) > > for (var in 1:ncol(df_test)) { > if (class(df_test$var)=="numeric") {
var goes from 1 to 3, above you actually use df_test$1 which is not what you intend. you shall use [] selection operator. However your Mode function does not correctly assign values for (var in 1:ncol(df_test)) { if (class(df_test[,var])=="numeric") { df_test[is.na(df_test[,var]), var] <- mean(df_test[,var], na.rm = TRUE) } else if (class(df_test[,var])=="character") { Mode(df_test[is.na(df_test[,var]),var], na.rm = TRUE) } } Warning message: In max(xtab) : no non-missing arguments to max; returning -Inf You shall use debug(Mode] to see what is going on. I have no time to inspect it and do not see any obvious flaw. Regards Petr > df_test$var[is.na(df_test$var)] <- mean(df_test$var, na.rm = TRUE) > } else if (class(df_test$var)=="character") { > Mode(df_test$var[is.na(df_test$var)], na.rm = TRUE) > } > } > > Where 'Mode' is the function: > > function (x, na.rm) > { > xtab <- table(x) > xmode <- names(which(xtab == max(xtab))) > if (length(xmode) > 1) > xmode <- ">1 mode" > return(xmode) > } > > > It seems as it is just ignoring the statements though, without giving > any error…Does anybody have any idea what is going on? > > Thank you very much for all the great help! > -f > > 2011/10/11 Weidong Gu <anopheles...@gmail.com>: > > In your case, it may not be sensible to simply fill missing values by > > mean or mode as multiple imputation becomes the norm this day. For > > your specific question, na.roughfix in randomForest package would do > > the work. > > > > Weidong Gu > > > > On Tue, Oct 11, 2011 at 8:11 AM, francesca casalino > > <francy.casal...@gmail.com> wrote: > >> Dear R experts, > >> > >> I have a large database made up of mixed data types (numeric, > >> character, factor, ordinal factor) with missing values, and I am > >> looking for a package that would help me impute the missing values > >> using either the mean if numerical or the mode if character/factor. > >> > >> I maybe could use replace like this: > >> df$var[is.na(df$var)] <- mean(df$var, na.rm = TRUE) > >> And go through all the many different variables of the datasets using > >> mean or mode for each, but I was wondering if there was a faster way, > >> or if a package existed to automate this (by doing 'mode' if it is a > >> factor or character or 'mean' if it is numeric)? > >> > >> I have tried the package "dprep" because I wanted to use the function > >> "ce.mimp", btu unfortunately it is not available anymore. > >> > >> Thank you for your help, > >> -francy > >> > >> ______________________________________________ > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >> > > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.