Interesting,

For some of the test cases, we don't have data for a particular field.

We have a training set of 20,000 entries.  For example, imagine the 
column "Average age of children".  If the person has no children, then 
the data is "NA".  However, I can't train an SVM with any NA data (at 
least not using the e1071 package), so I need to replace the NA with a 0.

If you have any suggestions on better ways to do this, I would really 
love to hear them.  I'm coming from RapidMiner and it handles a lot of 
this stuff "automatically".  (I've realized that's a "bad thing", so am 
trying to learn R.  Additionally, R seems MUCH MUCH faster.)

I'm open to ideas.

Thanks!

-N




On 8/2/09 4:14 PM, David Winsemius wrote:
>
> On Aug 2, 2009, at 7:02 PM, Noah Silverman wrote:
>
>> Hi,
>>
>> It seems as if the problem was caused by an odd quirk of the "scale"
>> function.
>>
>> Some of my data have NA entries.
>>
>> So, I substitute 0 for any NA with:
>> rawdata[is.na(rawdata)] <- 0
>
> Perhaps this would have done what you intended:
>
> rawdata[is.na(rawdata), ] <- 0
>
> # But this is added _only_ as a matter of coding behavior. See below.
>
>>
>> I then scale the data.
>>
>> For some reason that I don't understand, I find some NA back in the data
>> after the scale command.
>> But, issuing the same 0 substitution AFTER the scale command makes
>> everything work again.
>> rawdata[is.na(rawdata)] <- 0
>
> It "works" because rawdata has been converted by scale() to a matrix 
> which can be accessed as a vector.
>
>>
>
> The notion of adding zeroes for NA seems "so wrong". And the idea that 
> you might get the same results of doing so before scale() as after 
> scale() seems additionally bizarre.
>
>
>>
>> VERY strange behavior.
>>
>
> Your behavior might be seen as VERY strange by some.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to