Hi Yewint, > > The sample test code inside seems like that classifier read the whole index > db to train the model everytime when classification happened for > inputDocument. or am I misunderstanding something here?
I would suggest you to take a look to a couple of articles I wrote last summer about the Classification in Lucene and Solr : http://alexbenedetti.blogspot.co.uk/2015/07/lucene-document-classification.html http://alexbenedetti.blogspot.co.uk/2015/07/solr-document-classification-part-1.html Basically your misunderstood is that this module work as standard classifier, which is not our case. Lucene Classification doesn't train a model over time, the Index is your model. It uses the Index data structures to perform the classification processes (Knn and Simple Bayes are the algorithms I explored at that time) . Basically the algorithms access to Term Frequencies and Document Frequencies stored in the Inverted index. Having a big Index will affect as of course we are querying the index, but not because we are building a model. +1 on all Tommaso's observations! Cheers On 10 October 2015 at 20:36, Yewint Ko <yewintko2...@gmail.com> wrote: > Hi > > I am trying to use SimpleNaiveBayesClassifier in my solr project. Currently > looking at its test base ClassificationTestBase.java. > > The sample test code inside seems like that classifier read the whole index > db to train the model everytime when classification happened for > inputDocument. or am I misunderstanding something here? If i had a large > index db, will it impact performance? > > protected void checkCorrectClassification(Classifier<T> classifier, String > inputDoc, T expectedResult, Analyzer analyzer, String textFieldName, String > classFieldName, Query query) throws Exception { > > AtomicReader atomicReader = null; > > try { > > populateSampleIndex(analyzer); > > atomicReader = SlowCompositeReaderWrapper.wrap(indexWriter > .getReader()); > > classifier.train(atomicReader, textFieldName, classFieldName, > analyzer, > query); > > ClassificationResult<T> classificationResult = > classifier.assignClass( > inputDoc); > > assertNotNull(classificationResult.getAssignedClass()); > > assertEquals("got an assigned class of " + > classificationResult.getAssignedClass(), > expectedResult, classificationResult.getAssignedClass()); > > assertTrue("got a not positive score " + > classificationResult.getScore(), > classificationResult.getScore() > 0); > > } finally { > > if (atomicReader != null) > > atomicReader.close(); > > } > > } > -- -------------------------- Benedetti Alessandro Visiting card - http://about.me/alessandro_benedetti Blog - http://alexbenedetti.blogspot.co.uk "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England