> -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- > project.org] On Behalf Of Ista Zahn > Sent: Tuesday, August 23, 2011 11:06 AM > To: StellathePug > Cc: r-help@r-project.org > Subject: Re: [R] Replacing NAs in one variable with values of another > variable > > Hi, > > On Tue, Aug 23, 2011 at 12:29 PM, StellathePug > <ritacarre...@hotmail.com> wrote: > > Hello everyone, > > I am trying to figure out a way of replacing missing observations in > one of > > the variables of a data frame by values of another variable. For > example, > > assume my data is X > > > > X <-as.data.frame(matrix(c(9, 6, 1, 3, 9, "NA", "NA","NA","NA","NA", > > 6, 4, 3,"NA", "NA", "NA", 5, 4, 1, 3), ncol=2)) > > names(X)<-c("X1","X2") > > > > I want to change X1 so that instead of the missing values it uses the > values > > in X2 (regardless of whether these are missing). > > Note that you don't have any missing values in X, as "NA" != NA > > So my X1, should become > > X$X1 <- c(9, 6, 1, 3, 9, "NA", 5, 4, 1, 3). > > > > I have searched online for a while and looked at the manuals and the > best > > (unsuccessful) attempt I have come up with is > > > > X$X1[X$X1=="NA"] <- X$X2 > > > > and that produces the following X1 > > > > X$X1<-c(9, 6, 1, 3, 9, 6, "NA", 3, "NA", "NA") > > > > and generates the following warning: > > > > Warning messages: > > 1: In `[<-.factor`(`*tmp*`, X$X1 == "NA", value = c(5L, 3L, 2L, 6L, > : > > invalid factor level, NAs generated > > 2: In x[...] <- m : > > number of items to replace is not a multiple of replacement length > > > > I think that my error is that it is ignoring the non-missing values > of X1 > > and the dimensions don't match. But what I want my code to do is to > look at > > the rows of X1, see if it's a missing value; if it is, replace it > with the > > value that is in the row of X2; if it's not missing, leave it as is. > > Here are two solutions, one that is a correction to your first > attempt, and another using ifelse: > > X$X1[X$X1=="NA"] <- X$X2[X$X1=="NA"] > > X$X1 <- ifelse(X$X1 == "NA", X$X2, X$X1) > > > Best, > Ista >
Rita, In addition Ista's advice, I have a question. Did you really want your columns X1 and X2 to be factors? Your use of "NA" to represent missing has caused the columns to become factors. If you actually wanted a numeric matrix | data.frame then remove the quotes from around the NA. The you need to use is.na() to test for missing. X <-as.data.frame(matrix(c(9, 6, 1, 3, 9, NA, NA, NA, NA, NA, 6, 4, 3, NA, NA, NA, 5, 4, 1, 3), ncol=2)) names(X)<-c("X1","X2") X$X1 <- ifelse(is.na(X$X1), X$X2, X$X1) Hope this is helpful, Dan Daniel J. Nordlund Washington State Department of Social and Health Services Planning, Performance, and Accountability Research and Data Analysis Division Olympia, WA 98504-5204 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.