Thank you so much, I'm humbled by the response from such great authors and
scholars.  I thought I would share the final version that worked perfectly
in my illustrative example, as well as the real one.

My main confusion was this part:
> db1$olditems[db1$olditems=='']
[1]
Levels:  nuts soup
I thought it was only one item, but really it's all three.  Only the first
one labeled with "[1]"

A side note, I don't understand the motivation to use "within" when simple
subsetting works using $ or [
Maybe it's important if the data frame has a really long name?
# To me, this is easier to read:
db1$olditems = factor(db1$olditems))
# Than this
db1 <- within(db1, olditems <- factor(olditems))

Here is my code example working, thanks to the generous feedback:

#A little function I always have loaded:
# (which, incidentally, was inspired by "Modern Applied Statistics with S"
page 33)
factors=function(x)levels(x)[x]

# The data.frame option "stringsAsFactors=FALSE" would have been perfect to
use here,
#     but in my real example I can't re-import the data
db1 <- data.frame(
   olditems = c('soup','','','','nuts'),
   prices = c(4.45, 3.25, 4.42, 2.25, 3.98))
db2 <- data.frame(
   newitems = c('stew','crackers','tofu','goatsmilk','peanuts'))

db1$olditems[db1$olditems=='']         #it looks like only one item is
returned
length(db1$olditems[db1$olditems=='']) #but all three are actually returned

db1$olditems=factors(db1$olditems)     #converts the factors to strings
db1$olditems[db1$olditems=='']=NA      #replaces blanks with NA

#Note: this only works when db2 is in same order as db1
db1$olditems[is.na(db1$olditems)]=
    factors(db2$newitems[is.na(db1$olditems)])
db1$olditems=factor(db1$olditems)      #I like to use factors b/c they
inherently
                                       # give a count of unique values
db1$olditems                           #Success!

On Tue, Jul 21, 2009 at 8:22 PM, <bill.venab...@csiro.au> wrote:

> Couple of points:
>
> 1. if you are going to be replacing entries in factors with updated levels,
> it's probably easier if you start with your strings remaining as strings as
> they go into the data frames.  So here is how I would start your example
>
>
> db1 <- data.frame(
>    olditems = c('soup','','','','nuts'),
>     prices = c(4.45, 3.25, 4.42, 2.25, 3.98),
>        stringsAsFactors = FALSE)
> db2 <- data.frame(
>    newitems = c('stew','crackers','tofu','goatsmilk','peanuts'),
>        stringsAsFactors = FALSE)
>
>
> 2. Strings with zero characters are still strings (like zero is still a
> number).  They are not missing.  If you want them to be made missing you can
> do so afterwards with:
>
>
> #### zero length strings become NA
> is.na(db1$olditems[db1$olditems == '']) <- TRUE
>
>
> 3. Now to replace the missing values with the corresponding ones from the
> second data frame:
>
>
> k <- is.na(db1$olditems)
> db1[k, "olditems"] <- db2[k, "newitems"]
>
>
> 4. Check
>
> > db1
>   olditems prices
> 1      soup   4.45
> 2  crackers   3.25
> 3      tofu   4.42
> 4 goatsmilk   2.25
> 5      nuts   3.98
> >
>
> 5. If you really do want factors rather than character strings, you can now
> change back:
>
> db1 <- within(db1, olditems <- factor(olditems)) ## use <- here!
>
> 6. check the difference
>
> > str(db1)
> 'data.frame':   5 obs. of  2 variables:
>  $ olditems: Factor w/ 5 levels "crackers","goatsmilk",..: 4 1 5 2 3
>  $ prices  : num  4.45 3.25 4.42 2.25 3.98
> >
>
>
>
> Bill Venables
> http://www.cmis.csiro.au/bill.venables/
>
>
> -----Original Message-----
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf Of Gene Leynes
> Sent: Wednesday, 22 July 2009 10:39 AM
> To: r-help@r-project.org
> Subject: [R] How to replace NAs in a vector of factors?
>
> # Just when I thought I had the basic stuff mastered....
> # This has been quite perplexing, thanks for any help
>
>
> ## Here's the example:
>
> db1=data.frame(
>    olditems=c('soup','','','','nuts'),
>    prices=c(4.45, 3.25, 4.42, 2.25, 3.98))
> db2=data.frame(
>    newitems=c('stew','crackers','tofu','goatsmilk','peanuts'))
>
> str(db1)    #factors and prices
> str(db2)    #new names, but I want *only* the updates
>
> is.na(db1$olditems)  #a little surprising that '' is not equal to NA
> db1$olditems==''     #oh good, at least I can get to the blanks this way
> db1$olditems[db1$olditems=='']  #wait, only one item is returned?
> db1[db1$olditems=='',]  #somehow this works!
>
> #how would I get the new item names into the old items column of db1??
> # I was expecting that this would work:
> #    db1$olditems[db1$olditems=='']=
> #        db2$newitems[db1$olditems=='']
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to