Couple of points:

1. if you are going to be replacing entries in factors with updated levels, 
it's probably easier if you start with your strings remaining as strings as 
they go into the data frames.  So here is how I would start your example


db1 <- data.frame(
    olditems = c('soup','','','','nuts'),
    prices = c(4.45, 3.25, 4.42, 2.25, 3.98), 
        stringsAsFactors = FALSE)
db2 <- data.frame(
    newitems = c('stew','crackers','tofu','goatsmilk','peanuts'), 
        stringsAsFactors = FALSE)


2. Strings with zero characters are still strings (like zero is still a 
number).  They are not missing.  If you want them to be made missing you can do 
so afterwards with:


#### zero length strings become NA 
is.na(db1$olditems[db1$olditems == '']) <- TRUE


3. Now to replace the missing values with the corresponding ones from the 
second data frame:


k <- is.na(db1$olditems)
db1[k, "olditems"] <- db2[k, "newitems"]


4. Check

> db1
   olditems prices
1      soup   4.45
2  crackers   3.25
3      tofu   4.42
4 goatsmilk   2.25
5      nuts   3.98
> 

5. If you really do want factors rather than character strings, you can now 
change back:

db1 <- within(db1, olditems <- factor(olditems)) ## use <- here!

6. check the difference

> str(db1)
'data.frame':   5 obs. of  2 variables:
 $ olditems: Factor w/ 5 levels "crackers","goatsmilk",..: 4 1 5 2 3
 $ prices  : num  4.45 3.25 4.42 2.25 3.98
> 
 


Bill Venables
http://www.cmis.csiro.au/bill.venables/ 


-----Original Message-----
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Gene Leynes
Sent: Wednesday, 22 July 2009 10:39 AM
To: r-help@r-project.org
Subject: [R] How to replace NAs in a vector of factors?

# Just when I thought I had the basic stuff mastered....
# This has been quite perplexing, thanks for any help


## Here's the example:

db1=data.frame(
    olditems=c('soup','','','','nuts'),
    prices=c(4.45, 3.25, 4.42, 2.25, 3.98))
db2=data.frame(
    newitems=c('stew','crackers','tofu','goatsmilk','peanuts'))

str(db1)    #factors and prices
str(db2)    #new names, but I want *only* the updates

is.na(db1$olditems)  #a little surprising that '' is not equal to NA
db1$olditems==''     #oh good, at least I can get to the blanks this way
db1$olditems[db1$olditems=='']  #wait, only one item is returned?
db1[db1$olditems=='',]  #somehow this works!

#how would I get the new item names into the old items column of db1??
# I was expecting that this would work:
#    db1$olditems[db1$olditems=='']=
#        db2$newitems[db1$olditems=='']

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to