Thank you so much, I'm humbled by the response from such great authors and
scholars. I thought I would share the final version that worked perfectly
in my illustrative example, as well as the real one.
My main confusion was this part:
> db1$olditems[db1$olditems=='']
[1]
Levels: nuts soup
I thought it was only one item, but really it's all three. Only the first
one labeled with "[1]"
A side note, I don't understand the motivation to use "within" when simple
subsetting works using $ or [
Maybe it's important if the data frame has a really long name?
# To me, this is easier to read:
db1$olditems = factor(db1$olditems))
# Than this
db1 <- within(db1, olditems <- factor(olditems))
Here is my code example working, thanks to the generous feedback:
#A little function I always have loaded:
# (which, incidentally, was inspired by "Modern Applied Statistics with S"
page 33)
factors=function(x)levels(x)[x]
# The data.frame option "stringsAsFactors=FALSE" would have been perfect to
use here,
# but in my real example I can't re-import the data
db1 <- data.frame(
olditems = c('soup','','','','nuts'),
prices = c(4.45, 3.25, 4.42, 2.25, 3.98))
db2 <- data.frame(
newitems = c('stew','crackers','tofu','goatsmilk','peanuts'))
db1$olditems[db1$olditems==''] #it looks like only one item is
returned
length(db1$olditems[db1$olditems=='']) #but all three are actually returned
db1$olditems=factors(db1$olditems) #converts the factors to strings
db1$olditems[db1$olditems=='']=NA #replaces blanks with NA
#Note: this only works when db2 is in same order as db1
db1$olditems[is.na(db1$olditems)]=
factors(db2$newitems[is.na(db1$olditems)])
db1$olditems=factor(db1$olditems) #I like to use factors b/c they
inherently
# give a count of unique values
db1$olditems #Success!
On Tue, Jul 21, 2009 at 8:22 PM, <[email protected]> wrote:
> Couple of points:
>
> 1. if you are going to be replacing entries in factors with updated levels,
> it's probably easier if you start with your strings remaining as strings as
> they go into the data frames. So here is how I would start your example
>
>
> db1 <- data.frame(
> olditems = c('soup','','','','nuts'),
> prices = c(4.45, 3.25, 4.42, 2.25, 3.98),
> stringsAsFactors = FALSE)
> db2 <- data.frame(
> newitems = c('stew','crackers','tofu','goatsmilk','peanuts'),
> stringsAsFactors = FALSE)
>
>
> 2. Strings with zero characters are still strings (like zero is still a
> number). They are not missing. If you want them to be made missing you can
> do so afterwards with:
>
>
> #### zero length strings become NA
> is.na(db1$olditems[db1$olditems == '']) <- TRUE
>
>
> 3. Now to replace the missing values with the corresponding ones from the
> second data frame:
>
>
> k <- is.na(db1$olditems)
> db1[k, "olditems"] <- db2[k, "newitems"]
>
>
> 4. Check
>
> > db1
> olditems prices
> 1 soup 4.45
> 2 crackers 3.25
> 3 tofu 4.42
> 4 goatsmilk 2.25
> 5 nuts 3.98
> >
>
> 5. If you really do want factors rather than character strings, you can now
> change back:
>
> db1 <- within(db1, olditems <- factor(olditems)) ## use <- here!
>
> 6. check the difference
>
> > str(db1)
> 'data.frame': 5 obs. of 2 variables:
> $ olditems: Factor w/ 5 levels "crackers","goatsmilk",..: 4 1 5 2 3
> $ prices : num 4.45 3.25 4.42 2.25 3.98
> >
>
>
>
> Bill Venables
> http://www.cmis.csiro.au/bill.venables/
>
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]]
> On Behalf Of Gene Leynes
> Sent: Wednesday, 22 July 2009 10:39 AM
> To: [email protected]
> Subject: [R] How to replace NAs in a vector of factors?
>
> # Just when I thought I had the basic stuff mastered....
> # This has been quite perplexing, thanks for any help
>
>
> ## Here's the example:
>
> db1=data.frame(
> olditems=c('soup','','','','nuts'),
> prices=c(4.45, 3.25, 4.42, 2.25, 3.98))
> db2=data.frame(
> newitems=c('stew','crackers','tofu','goatsmilk','peanuts'))
>
> str(db1) #factors and prices
> str(db2) #new names, but I want *only* the updates
>
> is.na(db1$olditems) #a little surprising that '' is not equal to NA
> db1$olditems=='' #oh good, at least I can get to the blanks this way
> db1$olditems[db1$olditems==''] #wait, only one item is returned?
> db1[db1$olditems=='',] #somehow this works!
>
> #how would I get the new item names into the old items column of db1??
> # I was expecting that this would work:
> # db1$olditems[db1$olditems=='']=
> # db2$newitems[db1$olditems=='']
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.