On Thu, 22 Jan 2009, Mike Miller wrote:
Suppose X and Y are two data frames with the same structures, variable
names and dimensions but with different data and different patterns of
missing. I want to replace missing values in Y with corresponding
values from X. I'll construct a simple two-by-two case:
X <- as.data.frame(matrix(c("a","b",1,2),2,2), stringsAsFactors=FALSE)
X[,2] <- as.integer(X[,2])
str(X)
'data.frame': 2 obs. of 2 variables:
$ V1: chr "a" "b"
$ V2: int 1 2
Y <- as.data.frame(matrix(c("c","d",NA,4),2,2), stringsAsFactors=FALSE)
Y[,2] <- as.integer(Y[,2])
str(Y)
'data.frame': 2 obs. of 2 variables:
$ V1: chr "c" "d"
$ V2: int NA 4
This seems to be what I want to do...
Y[is.na(Y)] <- X[is.na(Y)]
...and it works except that the structure of Y is changed so that Y$V2 is now
of type chr instead of type int:
str(Y)
'data.frame': 2 obs. of 2 variables:
$ V1: chr "c" "d"
$ V2: chr "1" "4"
I figured out a good answer. We can just decide the list of columns we
want to work with and then use a for loop. This avoids problems with
changing variable types:
cols <- 38:47
keep <- is.na(Y)
for (i in cols) { nas <- which(keep[,i]); if ( length(nas) > 0 ) { Y[nas,i] <-
X[nas,i] }}
Something like that makes for a good one-liner on the interactive command
line, but this looks neater in a script:
cols <- 38:47
keep <- is.na(Y)
for (i in cols) {
nas <- which(keep[,i])
if ( length(nas) > 0 ) {
Y[nas,i] <- X[nas,i]
}
}
It shouldn't be too hard to write a function that does that kind of thing.
The only problem I know of is that if X and Y don't have exactly the same
levels for factors, if there are factors, there could be problems. It
would probably take a few more lines to deal with this
A couple of people wrote to me with helpful suggestions, but no one had a
really great, established kind of solution. I'm a little surprised. But,
with an average of 125 messages per day (!) on this list, I shouldn't be
surprised that a long message like this one won't be read by everyone.
Best,
Mike
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.