Hi, Allesandro, sorry for the delay. What do you mean?
As I mentioned earlier, I followed a super simply set of steps. 1. Download Solr 2. Configure classification 3. Create some documents using curl over HTTP. Is it difficult to reproduce the steps / problem? Tomas > On 23 Jun 2016, at 16:42, Alessandro Benedetti <benedetti.ale...@gmail.com> > wrote: > > Can you give an example of your schema, and can you run a simple query for > you index, curious to see how the input fields are analyzed. > > Cheers > > On Wed, Jun 22, 2016 at 6:05 PM, Alessandro Benedetti < > benedetti.ale...@gmail.com> wrote: > >> This is better! At list the classifier is invoked! >> How many docs in the index have the class assigned? >> Take a look to the stacktrace and you should find the cause! >> I am now on mobile, I will check the code tomorrow! >> Cheers >> On 22 Jun 2016 5:26 pm, "Tomas Ramanauskas" < >> tomas.ramanaus...@springer.com> wrote: >> >>> >>> I also tried with this config (adding **): >>> >>> >>> <initParams path="/update/**"> >>> <lst name="defaults"> >>> <str name="update.chain">classification</str> >>> </lst> >>> </initParams> >>> >>> >>> >>> >>> >>> And I get the error: >>> >>> >>> >>> $ curl http://localhost:8983/solr/demo/update -d ' >>> [ >>> {"id" : "book15", >>> "title_t":["The Way of Kings"], >>> "author_s":"Brandon Sanderson", >>> "cat_s": null, >>> "pubyear_i":2010, >>> "ISBN_s":"978-0-7653-2635-5" >>> } >>> ]' >>> {"responseHeader":{"status":500,"QTime":29},"error":{"trace":"java.lang.NullPointerException\n\tat >>> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.getTokenArray(SimpleNaiveBayesDocumentClassifier.java:202)\n\tat >>> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.analyzeSeedDocument(SimpleNaiveBayesDocumentClassifier.java:162)\n\tat >>> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.assignNormClasses(SimpleNaiveBayesDocumentClassifier.java:121)\n\tat >>> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.assignClass(SimpleNaiveBayesDocumentClassifier.java:81)\n\tat >>> org.apache.solr.update.processor.ClassificationUpdateProcessor.processAdd(ClassificationUpdateProcessor.java:94)\n\tat >>> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.handleAdds(JsonLoader.java:474)\n\tat >>> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:138)\n\tat >>> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:114)\n\tat >>> org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:77)\n\tat >>> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)\n\tat >>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69)\n\tat >>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)\n\tat >>> org.apache.solr.core.SolrCore.execute(SolrCore.java:2036)\n\tat >>> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:657)\n\tat >>> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)\n\tat >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)\n\tat >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)\n\tat >>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)\n\tat >>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)\n\tat >>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat >>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat >>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat >>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)\n\tat >>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)\n\tat >>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat >>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)\n\tat >>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat >>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\n\tat >>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\n\tat >>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat >>> org.eclipse.jetty.server.Server.handle(Server.java:518)\n\tat >>> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)\n\tat >>> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)\n\tat >>> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)\n\tat >>> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)\n\tat >>> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat >>> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)\n\tat >>> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)\n\tat >>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)\n\tat >>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)\n\tat >>> java.lang.Thread.run(Thread.java:745)\n","code":500}} >>> >>> >>> Tomas >>> >>> >>> On 22 Jun 2016, at 17:22, Tomas Ramanauskas < >>> tomas.ramanaus...@springer.com<mailto:tomas.ramanaus...@springer.com>> >>> wrote: >>> >>> Thanks for the response, Alessandro. >>> >>> I tried this and it didn’t work either: >>> >>> >>> >>> $ curl http://localhost:8983/solr/demo/update -d ' >>> [ >>> {"id" : "book14", >>> "title_t":["The Way of Kings"], >>> "author_s":"Brandon Sanderson", >>> "cat_s": null, >>> "pubyear_i":2010, >>> "ISBN_s":"978-0-7653-2635-5" >>> } >>> ]’ >>> >>> {"responseHeader":{"status":0,"QTime":2}} >>> >>> $ curl http://localhost:8983/solr/demo/get?id=book14 >>> { >>> "doc": >>> { >>> "id":"book14", >>> "title_t":["The Way of Kings"], >>> "author_s":"Brandon Sanderson", >>> "pubyear_i":2010, >>> "ISBN_s":"978-0-7653-2635-5", >>> "_version_":1537854598189940736}} >>> >>> >>> I don’t see “cat_s” field in the results at all. >>> >>> >>> Tomas >>> >>> >>> On 22 Jun 2016, at 16:39, Alessandro Benedetti <abenede...@apache.org >>> <mailto:abenede...@apache.org>> wrote: >>> >>> Hi Tomas, >>> first consideration : >>> an empty string is different from a NULL string. >>> This is controversial, I would suggest you to never use the empty String >>> as >>> this can cause some others side effect. >>> Apart from that, the plugin will add the class only if the class field is >>> without any value >>> >>> Object documentClass = doc.getFieldValue(classFieldName); >>> if (documentClass == null) { >>> >>> Saying that, I would suggest you to build a sample index with some >>> document and then try to classify. >>> If this doesn't solve your issue, I can help you further. >>> >>> Cheers >>> >>> On Wed, Jun 22, 2016 at 3:45 PM, Tomas Ramanauskas < >>> tomas.ramanaus...@springer.com<mailto:tomas.ramanaus...@springer.com>> >>> wrote: >>> >>> I also tried this configuration, but could get the feature to work: >>> >>> >>> >>> <initParams path="/update/"> >>> <lst name="defaults"> >>> <str name="update.chain">classification</str> >>> </lst> >>> </initParams> >>> >>> >>> <updateRequestProcessorChain name="classification"> >>> <processor class="solr.ClassificationUpdateProcessorFactory"> >>> <str name="inputFields">title_t,author_s</str> >>> <str name="classField">cat_s</str> >>> <str name="algorithm">bayes</str> >>> </processor> >>> </updateRequestProcessorChain> >>> >>> >>> Tomas >>> >>> On 22 Jun 2016, at 13:46, Tomas Ramanauskas < >>> tomas.ramanaus...@springer.com<mailto:tomas.ramanaus...@springer.com >>>> <mailto:tomas.ramanaus...@springer.com>> >>> wrote: >>> >>> P.S. The version I use: >>> >>> 6.1.0-68 >>> >>> Also, earlier I said “If I modify an existing record, I think the >>> functionality works:”, but I think it doesn’t work for me at all. >>> >>> $ curl http://localhost:8983/solr/demo/get?id=book1 >>> { >>> "doc": >>> { >>> "id":"book1", >>> "title_t":["The Way of Kings"], >>> "author_s":"Brandon Sanderson", >>> "cat_s":"fantasy", >>> "pubyear_i":2010, >>> "ISBN_s":"978-0-7653-2635-5", >>> "_version_":1535488016326328320}} >>> >>> $ curl http://localhost:8983/solr/demo/update -d ' >>> [ >>> {"id" : "book1", >>> "title_t":["The Way of Kings"], >>> "author_s":"Brandon Sanderson", >>> "cat_s":"aaa", >>> "pubyear_i":2010, >>> "ISBN_s":"978-0-7653-2635-5" >>> } >>> ]' >>> {"responseHeader":{"status":0,"QTime":0}} >>> >>> $ curl http://localhost:8983/solr/demo/get?id=book1 >>> { >>> "doc": >>> { >>> "id":"book1", >>> "title_t":["The Way of Kings"], >>> "author_s":"Brandon Sanderson", >>> "cat_s":"fantasy", >>> "pubyear_i":2010, >>> "ISBN_s":"978-0-7653-2635-5", >>> "_version_":1535488016326328320}} >>> >>> >>> Tomas >>> >>> >>> On 22 Jun 2016, at 12:47, Tomas Ramanauskas < >>> tomas.ramanaus...@springer.com<mailto:tomas.ramanaus...@springer.com >>>> <mailto:tomas.ramanaus...@springer.com>> >>> wrote: >>> >>> Hi, everyone, >>> >>> >>> would someone be able to share a working example (step by step) that >>> demonstrates the use of Naive Bayes classifier in Solr? >>> >>> >>> I followed this Blog post: >>> >>> >>> https://alexbenedetti.blogspot.co.uk/2015/07/solr-document-classification-part-1.html?showComment=1464358093048#c2489902302085000947 >>> >>> And this tutorial: >>> http://yonik.com/solr-tutorial/ >>> >>> And this JIRA ticket: >>> https://issues.apache.org/jira/browse/SOLR-7739 >>> >>> >>> >>> So this is my configuration file (only what I added or modified): >>> >>> <initParams path="/update/**"> >>> <lst name="defaults"> >>> <str name="update.chain">classification</str> >>> </lst> >>> </initParams> >>> >>> >>> <updateRequestProcessorChain name="classification"> >>> <processor class="solr.ClassificationUpdateProcessorFactory"> >>> <str name="inputFields">title_t,author_s</str> >>> <str name="classField">cat_s</str> >>> <str name="algorithm">bayes</str> >>> </processor> >>> </updateRequestProcessorChain> >>> >>> >>> >>> If I modify an existing record, I think the functionality works: >>> >>> >>> $ curl http://localhost:8983/solr/demo/update -d ' >>> [ >>> {"id" : "book1", >>> "title_t":["The Way of Kings"], >>> "author_s":"Brandon Sanderson", >>> "cat_s":"", >>> "pubyear_i":2010, >>> "ISBN_s":"978-0-7653-2635-5" >>> } >>> ]' >>> {"responseHeader":{"status":0,"QTime":8}} >>> $ curl http://localhost:8983/solr/demo/get?id=book1 >>> { >>> "doc": >>> { >>> "id":"book1", >>> "title_t":["The Way of Kings"], >>> "author_s":"Brandon Sanderson", >>> "cat_s":"fantasy", >>> "pubyear_i":2010, >>> "ISBN_s":"978-0-7653-2635-5", >>> "_version_":1535488016326328320}} >>> >>> >>> >>> >>> If I add a new document, something isn’t quite working: >>> >>> $ curl http://localhost:8983/solr/demo/update -d ' >>> [ >>> {"id" : "book7", >>> "title_t":["The Way of Kings"], >>> "author_s":"Brandon Sanderson", >>> "cat_s":"", >>> "pubyear_i":2010, >>> "ISBN_s":"978-0-7653-2635-5" >>> } >>> ]' >>> {"responseHeader":{"status":0,"QTime":0}} >>> $ curl http://localhost:8983/solr/demo/get?id=book7 >>> { >>> "doc":null} >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> -------------------------- >>> >>> Benedetti Alessandro >>> Visiting card : http://about.me/alessandro_benedetti >>> >>> "Tyger, tyger burning bright >>> In the forests of the night, >>> What immortal hand or eye >>> Could frame thy fearful symmetry?" >>> >>> William Blake - Songs of Experience -1794 England >>> >>> >>> > > > -- > -------------------------- > > Benedetti Alessandro > Visiting card - http://about.me/alessandro_benedetti > Blog - http://alexbenedetti.blogspot.co.uk > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England