Interesting, For some of the test cases, we don't have data for a particular field.
We have a training set of 20,000 entries. For example, imagine the column "Average age of children". If the person has no children, then the data is "NA". However, I can't train an SVM with any NA data (at least not using the e1071 package), so I need to replace the NA with a 0. If you have any suggestions on better ways to do this, I would really love to hear them. I'm coming from RapidMiner and it handles a lot of this stuff "automatically". (I've realized that's a "bad thing", so am trying to learn R. Additionally, R seems MUCH MUCH faster.) I'm open to ideas. Thanks! -N On 8/2/09 4:14 PM, David Winsemius wrote: > > On Aug 2, 2009, at 7:02 PM, Noah Silverman wrote: > >> Hi, >> >> It seems as if the problem was caused by an odd quirk of the "scale" >> function. >> >> Some of my data have NA entries. >> >> So, I substitute 0 for any NA with: >> rawdata[is.na(rawdata)] <- 0 > > Perhaps this would have done what you intended: > > rawdata[is.na(rawdata), ] <- 0 > > # But this is added _only_ as a matter of coding behavior. See below. > >> >> I then scale the data. >> >> For some reason that I don't understand, I find some NA back in the data >> after the scale command. >> But, issuing the same 0 substitution AFTER the scale command makes >> everything work again. >> rawdata[is.na(rawdata)] <- 0 > > It "works" because rawdata has been converted by scale() to a matrix > which can be accessed as a vector. > >> > > The notion of adding zeroes for NA seems "so wrong". And the idea that > you might get the same results of doing so before scale() as after > scale() seems additionally bizarre. > > >> >> VERY strange behavior. >> > > Your behavior might be seen as VERY strange by some. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.