Re: [R] Analyzing Poor Performance Using naiveBayes()

2012-09-14 Thread Patrick Connolly
On Thu, 09-Aug-2012 at 03:40PM -0700, Kirk Fleming wrote: |> My data is 50,000 instances of about 200 predictor values, and for |> all 50,000 examples I have the actual class labels (binary). The |> data is quite unbalanced with about 10% or less of the examples |> having a positive outcome and th

Re: [R] Analyzing Poor Performance Using naiveBayes()

2012-08-10 Thread Kirk Fleming
As some additional information, I re-ran the model across the range of n = 50 to 150 (n being the 'top n' parameters returned by chi.squared), and this time used a completed different subset of the data for both training and test. Nearly identical results, with the typical train AUC about 0.98 and

Re: [R] Analyzing Poor Performance Using naiveBayes()

2012-08-10 Thread Kirk Fleming
Per your suggestion I ran chi.squared() against my training data and to my delight, found just 50 parameters that were non-zero influencers. I built the model through several iterations and found n = 12 to be the optimum for the training data. However, results still no so good for the test data. H

Re: [R] Analyzing Poor Performance Using naiveBayes()

2012-08-09 Thread C.H.
I think you have been hit by the problem of high variance. (overfitting) Maybe you should consider doing a feature selection perhaps using the chisq ranking from FSelector. And then training the Naive Bayes using the top n features (n=1 to 200) as ranked by chisq, plot the AUCs or F1 score from b

[R] Analyzing Poor Performance Using naiveBayes()

2012-08-09 Thread Kirk Fleming
My data is 50,000 instances of about 200 predictor values, and for all 50,000 examples I have the actual class labels (binary). The data is quite unbalanced with about 10% or less of the examples having a positive outcome and the remainder, of course, negative. Nothing suggests the data has any ord