On Thu, 09-Aug-2012 at 03:40PM -0700, Kirk Fleming wrote:
|> My data is 50,000 instances of about 200 predictor values, and for
|> all 50,000 examples I have the actual class labels (binary). The
|> data is quite unbalanced with about 10% or less of the examples
|> having a positive outcome and th
As some additional information, I re-ran the model across the range of n = 50
to 150 (n being the 'top n' parameters returned by chi.squared), and this
time used a completed different subset of the data for both training and
test. Nearly identical results, with the typical train AUC about 0.98 and
Per your suggestion I ran chi.squared() against my training data and to my
delight, found just 50 parameters that were non-zero influencers. I built
the model through several iterations and found n = 12 to be the optimum for
the training data.
However, results still no so good for the test data. H
I think you have been hit by the problem of high variance. (overfitting)
Maybe you should consider doing a feature selection perhaps using the
chisq ranking from FSelector.
And then training the Naive Bayes using the top n features (n=1 to
200) as ranked by chisq, plot the AUCs or F1 score from b
My data is 50,000 instances of about 200 predictor values, and for all 50,000
examples I have the actual class labels (binary). The data is quite
unbalanced with about 10% or less of the examples having a positive outcome
and the remainder, of course, negative. Nothing suggests the data has any
ord
5 matches
Mail list logo