Using SimpleNaiveBayesClassifier in solr

2015-10-10 Thread Yewint Ko
Hi

I am trying to use SimpleNaiveBayesClassifier in my solr project. Currently
looking at its test base ClassificationTestBase.java.

The sample test code inside seems like that classifier read the whole index
db to train the model everytime when classification happened for
inputDocument. or am I misunderstanding something here? If i had a large
index db, will it impact performance?

protected void checkCorrectClassification(Classifier classifier, String
inputDoc, T expectedResult, Analyzer analyzer, String textFieldName, String
classFieldName, Query query) throws Exception {

AtomicReader atomicReader = null;

try {

  populateSampleIndex(analyzer);

  atomicReader = SlowCompositeReaderWrapper.wrap(indexWriter
.getReader());

  classifier.train(atomicReader, textFieldName, classFieldName, analyzer,
query);

  ClassificationResult classificationResult = classifier.assignClass(
inputDoc);

  assertNotNull(classificationResult.getAssignedClass());

  assertEquals("got an assigned class of " +
classificationResult.getAssignedClass(),
expectedResult, classificationResult.getAssignedClass());

  assertTrue("got a not positive score " + classificationResult.getScore(),
classificationResult.getScore() > 0);

} finally {

  if (atomicReader != null)

atomicReader.close();

}

  }


Using SimpleNaiveBayesClassifier in solr

2015-10-10 Thread Yewint Ko
Hi

I am trying to use NaiveBayesClassifier in my solr project. Currently
looking at its test case ClassificationTestBase.java.

Below codes seems like that classifier read the whole index db to train the
model everytime when classification happened for inputDocument. or am I
misunderstanding something here? If i had a large index db, will it impact
performance?

protected void checkCorrectClassification(Classifier classifier, String
inputDoc, T expectedResult, Analyzer analyzer, String textFieldName, String
classFieldName, Query query) throws Exception {

AtomicReader atomicReader = null;

try {

  populateSampleIndex(analyzer);

  atomicReader = SlowCompositeReaderWrapper.wrap(indexWriter
.getReader());

  classifier.train(atomicReader, textFieldName, classFieldName, analyzer,
query);

  ClassificationResult classificationResult = classifier.assignClass(
inputDoc);

  assertNotNull(classificationResult.getAssignedClass());

  assertEquals("got an assigned class of " +
classificationResult.getAssignedClass(),
expectedResult, classificationResult.getAssignedClass());

  assertTrue("got a not positive score " + classificationResult.getScore(),
classificationResult.getScore() > 0);

} finally {

  if (atomicReader != null)

atomicReader.close();

}

  }


Re: Using SimpleNaiveBayesClassifier in solr

2015-10-14 Thread Yewint Ko
Thank Ales and Tommaso for your replies

So, is it like the classifier query the whole index db and load onto memory
first before running tokenizer against InputDocument? It sounds like if I
don't close the classifier and my index is big,  i might need bigger
machine. Anyway to reverse the order? Do I sound dump?

On 12 October 2015 at 16:11, Alessandro Benedetti <
benedetti.ale...@gmail.com> wrote:

> Hi Yewint,
> >
> > The sample test code inside seems like that classifier read the whole
> index
> > db to train the model everytime when classification happened for
> > inputDocument. or am I misunderstanding something here?
>
>
> I would suggest you to take a look to a couple of articles I wrote last
> summer about the Classification in Lucene and Solr :
>
>
> http://alexbenedetti.blogspot.co.uk/2015/07/lucene-document-classification.html
>
>
> http://alexbenedetti.blogspot.co.uk/2015/07/solr-document-classification-part-1.html
>
> Basically your misunderstood is that this module work as standard
> classifier, which is not our case.
> Lucene Classification doesn't train a model over time, the Index is your
> model.
> It uses the Index data structures to perform the classification processes
> (Knn and Simple Bayes are the algorithms I explored at that time) .
> Basically the algorithms access to Term Frequencies and Document
> Frequencies stored in the Inverted index.
>
> Having a big Index will affect as of course we are querying the index, but
> not because we are building a model.
>
> +1 on all Tommaso's observations!
>
> Cheers
>
>
>
> On 10 October 2015 at 20:36, Yewint Ko  wrote:
>
> > Hi
> >
> > I am trying to use SimpleNaiveBayesClassifier in my solr project.
> Currently
> > looking at its test base ClassificationTestBase.java.
> >
> > The sample test code inside seems like that classifier read the whole
> index
> > db to train the model everytime when classification happened for
> > inputDocument. or am I misunderstanding something here? If i had a large
> > index db, will it impact performance?
> >
> > protected void checkCorrectClassification(Classifier classifier,
> String
> > inputDoc, T expectedResult, Analyzer analyzer, String textFieldName,
> String
> > classFieldName, Query query) throws Exception {
> >
> > AtomicReader atomicReader = null;
> >
> > try {
> >
> >   populateSampleIndex(analyzer);
> >
> >   atomicReader = SlowCompositeReaderWrapper.wrap(indexWriter
> > .getReader());
> >
> >   classifier.train(atomicReader, textFieldName, classFieldName,
> > analyzer,
> > query);
> >
> >   ClassificationResult classificationResult =
> > classifier.assignClass(
> > inputDoc);
> >
> >   assertNotNull(classificationResult.getAssignedClass());
> >
> >   assertEquals("got an assigned class of " +
> > classificationResult.getAssignedClass(),
> > expectedResult, classificationResult.getAssignedClass());
> >
> >   assertTrue("got a not positive score " +
> > classificationResult.getScore(),
> > classificationResult.getScore() > 0);
> >
> > } finally {
> >
> >   if (atomicReader != null)
> >
> > atomicReader.close();
> >
> > }
> >
> >   }
> >
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>