Thank you so much, I'm humbled by the response from such great authors and scholars. I thought I would share the final version that worked perfectly in my illustrative example, as well as the real one.
My main confusion was this part: > db1$olditems[db1$olditems==''] [1] Levels: nuts soup I thought it was only one item, but really it's all three. Only the first one labeled with "[1]" A side note, I don't understand the motivation to use "within" when simple subsetting works using $ or [ Maybe it's important if the data frame has a really long name? # To me, this is easier to read: db1$olditems = factor(db1$olditems)) # Than this db1 <- within(db1, olditems <- factor(olditems)) Here is my code example working, thanks to the generous feedback: #A little function I always have loaded: # (which, incidentally, was inspired by "Modern Applied Statistics with S" page 33) factors=function(x)levels(x)[x] # The data.frame option "stringsAsFactors=FALSE" would have been perfect to use here, # but in my real example I can't re-import the data db1 <- data.frame( olditems = c('soup','','','','nuts'), prices = c(4.45, 3.25, 4.42, 2.25, 3.98)) db2 <- data.frame( newitems = c('stew','crackers','tofu','goatsmilk','peanuts')) db1$olditems[db1$olditems==''] #it looks like only one item is returned length(db1$olditems[db1$olditems=='']) #but all three are actually returned db1$olditems=factors(db1$olditems) #converts the factors to strings db1$olditems[db1$olditems=='']=NA #replaces blanks with NA #Note: this only works when db2 is in same order as db1 db1$olditems[is.na(db1$olditems)]= factors(db2$newitems[is.na(db1$olditems)]) db1$olditems=factor(db1$olditems) #I like to use factors b/c they inherently # give a count of unique values db1$olditems #Success! On Tue, Jul 21, 2009 at 8:22 PM, <bill.venab...@csiro.au> wrote: > Couple of points: > > 1. if you are going to be replacing entries in factors with updated levels, > it's probably easier if you start with your strings remaining as strings as > they go into the data frames. So here is how I would start your example > > > db1 <- data.frame( > olditems = c('soup','','','','nuts'), > prices = c(4.45, 3.25, 4.42, 2.25, 3.98), > stringsAsFactors = FALSE) > db2 <- data.frame( > newitems = c('stew','crackers','tofu','goatsmilk','peanuts'), > stringsAsFactors = FALSE) > > > 2. Strings with zero characters are still strings (like zero is still a > number). They are not missing. If you want them to be made missing you can > do so afterwards with: > > > #### zero length strings become NA > is.na(db1$olditems[db1$olditems == '']) <- TRUE > > > 3. Now to replace the missing values with the corresponding ones from the > second data frame: > > > k <- is.na(db1$olditems) > db1[k, "olditems"] <- db2[k, "newitems"] > > > 4. Check > > > db1 > olditems prices > 1 soup 4.45 > 2 crackers 3.25 > 3 tofu 4.42 > 4 goatsmilk 2.25 > 5 nuts 3.98 > > > > 5. If you really do want factors rather than character strings, you can now > change back: > > db1 <- within(db1, olditems <- factor(olditems)) ## use <- here! > > 6. check the difference > > > str(db1) > 'data.frame': 5 obs. of 2 variables: > $ olditems: Factor w/ 5 levels "crackers","goatsmilk",..: 4 1 5 2 3 > $ prices : num 4.45 3.25 4.42 2.25 3.98 > > > > > > Bill Venables > http://www.cmis.csiro.au/bill.venables/ > > > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] > On Behalf Of Gene Leynes > Sent: Wednesday, 22 July 2009 10:39 AM > To: r-help@r-project.org > Subject: [R] How to replace NAs in a vector of factors? > > # Just when I thought I had the basic stuff mastered.... > # This has been quite perplexing, thanks for any help > > > ## Here's the example: > > db1=data.frame( > olditems=c('soup','','','','nuts'), > prices=c(4.45, 3.25, 4.42, 2.25, 3.98)) > db2=data.frame( > newitems=c('stew','crackers','tofu','goatsmilk','peanuts')) > > str(db1) #factors and prices > str(db2) #new names, but I want *only* the updates > > is.na(db1$olditems) #a little surprising that '' is not equal to NA > db1$olditems=='' #oh good, at least I can get to the blanks this way > db1$olditems[db1$olditems==''] #wait, only one item is returned? > db1[db1$olditems=='',] #somehow this works! > > #how would I get the new item names into the old items column of db1?? > # I was expecting that this would work: > # db1$olditems[db1$olditems=='']= > # db2$newitems[db1$olditems==''] > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.