But how big it is your index ? Are you expecting Solr to automatically classify your documents without any knowledge groundbase ? Please attach an example of schema. There was a reason if I asked you :) Seems related the fact we get no token from the text analysis.
Cheers On Fri, Jul 15, 2016 at 12:11 PM, Tomas Ramanauskas < tomas.ramanaus...@springer.com> wrote: > Hi, Allesandro, > > sorry for the delay. What do you mean? > > > As I mentioned earlier, I followed a super simply set of steps. > > 1. Download Solr > 2. Configure classification > 3. Create some documents using curl over HTTP. > > > Is it difficult to reproduce the steps / problem? > > > Tomas > > > > > On 23 Jun 2016, at 16:42, Alessandro Benedetti < > benedetti.ale...@gmail.com> wrote: > > > > Can you give an example of your schema, and can you run a simple query > for > > you index, curious to see how the input fields are analyzed. > > > > Cheers > > > > On Wed, Jun 22, 2016 at 6:05 PM, Alessandro Benedetti < > > benedetti.ale...@gmail.com> wrote: > > > >> This is better! At list the classifier is invoked! > >> How many docs in the index have the class assigned? > >> Take a look to the stacktrace and you should find the cause! > >> I am now on mobile, I will check the code tomorrow! > >> Cheers > >> On 22 Jun 2016 5:26 pm, "Tomas Ramanauskas" < > >> tomas.ramanaus...@springer.com> wrote: > >> > >>> > >>> I also tried with this config (adding **): > >>> > >>> > >>> <initParams path="/update/**"> > >>> <lst name="defaults"> > >>> <str name="update.chain">classification</str> > >>> </lst> > >>> </initParams> > >>> > >>> > >>> > >>> > >>> > >>> And I get the error: > >>> > >>> > >>> > >>> $ curl http://localhost:8983/solr/demo/update -d ' > >>> [ > >>> {"id" : "book15", > >>> "title_t":["The Way of Kings"], > >>> "author_s":"Brandon Sanderson", > >>> "cat_s": null, > >>> "pubyear_i":2010, > >>> "ISBN_s":"978-0-7653-2635-5" > >>> } > >>> ]' > >>> > {"responseHeader":{"status":500,"QTime":29},"error":{"trace":"java.lang.NullPointerException\n\tat > >>> > org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.getTokenArray(SimpleNaiveBayesDocumentClassifier.java:202)\n\tat > >>> > org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.analyzeSeedDocument(SimpleNaiveBayesDocumentClassifier.java:162)\n\tat > >>> > org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.assignNormClasses(SimpleNaiveBayesDocumentClassifier.java:121)\n\tat > >>> > org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.assignClass(SimpleNaiveBayesDocumentClassifier.java:81)\n\tat > >>> > org.apache.solr.update.processor.ClassificationUpdateProcessor.processAdd(ClassificationUpdateProcessor.java:94)\n\tat > >>> > org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.handleAdds(JsonLoader.java:474)\n\tat > >>> > org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:138)\n\tat > >>> > org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:114)\n\tat > >>> > org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:77)\n\tat > >>> > org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)\n\tat > >>> > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69)\n\tat > >>> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)\n\tat > >>> org.apache.solr.core.SolrCore.execute(SolrCore.java:2036)\n\tat > >>> > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:657)\n\tat > >>> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)\n\tat > >>> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)\n\tat > >>> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)\n\tat > >>> > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)\n\tat > >>> > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)\n\tat > >>> > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat > >>> > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat > >>> > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat > >>> > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)\n\tat > >>> > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)\n\tat > >>> > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat > >>> > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)\n\tat > >>> > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat > >>> > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\n\tat > >>> > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\n\tat > >>> > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat > >>> org.eclipse.jetty.server.Server.handle(Server.java:518)\n\tat > >>> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)\n\tat > >>> > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)\n\tat > >>> > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)\n\tat > >>> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)\n\tat > >>> > org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat > >>> > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)\n\tat > >>> > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)\n\tat > >>> > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)\n\tat > >>> > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)\n\tat > >>> java.lang.Thread.run(Thread.java:745)\n","code":500}} > >>> > >>> > >>> Tomas > >>> > >>> > >>> On 22 Jun 2016, at 17:22, Tomas Ramanauskas < > >>> tomas.ramanaus...@springer.com<mailto:tomas.ramanaus...@springer.com>> > >>> wrote: > >>> > >>> Thanks for the response, Alessandro. > >>> > >>> I tried this and it didn’t work either: > >>> > >>> > >>> > >>> $ curl http://localhost:8983/solr/demo/update -d ' > >>> [ > >>> {"id" : "book14", > >>> "title_t":["The Way of Kings"], > >>> "author_s":"Brandon Sanderson", > >>> "cat_s": null, > >>> "pubyear_i":2010, > >>> "ISBN_s":"978-0-7653-2635-5" > >>> } > >>> ]’ > >>> > >>> {"responseHeader":{"status":0,"QTime":2}} > >>> > >>> $ curl http://localhost:8983/solr/demo/get?id=book14 > >>> { > >>> "doc": > >>> { > >>> "id":"book14", > >>> "title_t":["The Way of Kings"], > >>> "author_s":"Brandon Sanderson", > >>> "pubyear_i":2010, > >>> "ISBN_s":"978-0-7653-2635-5", > >>> "_version_":1537854598189940736}} > >>> > >>> > >>> I don’t see “cat_s” field in the results at all. > >>> > >>> > >>> Tomas > >>> > >>> > >>> On 22 Jun 2016, at 16:39, Alessandro Benedetti <abenede...@apache.org > >>> <mailto:abenede...@apache.org>> wrote: > >>> > >>> Hi Tomas, > >>> first consideration : > >>> an empty string is different from a NULL string. > >>> This is controversial, I would suggest you to never use the empty > String > >>> as > >>> this can cause some others side effect. > >>> Apart from that, the plugin will add the class only if the class field > is > >>> without any value > >>> > >>> Object documentClass = doc.getFieldValue(classFieldName); > >>> if (documentClass == null) { > >>> > >>> Saying that, I would suggest you to build a sample index with some > >>> document and then try to classify. > >>> If this doesn't solve your issue, I can help you further. > >>> > >>> Cheers > >>> > >>> On Wed, Jun 22, 2016 at 3:45 PM, Tomas Ramanauskas < > >>> tomas.ramanaus...@springer.com<mailto:tomas.ramanaus...@springer.com>> > >>> wrote: > >>> > >>> I also tried this configuration, but could get the feature to work: > >>> > >>> > >>> > >>> <initParams path="/update/"> > >>> <lst name="defaults"> > >>> <str name="update.chain">classification</str> > >>> </lst> > >>> </initParams> > >>> > >>> > >>> <updateRequestProcessorChain name="classification"> > >>> <processor class="solr.ClassificationUpdateProcessorFactory"> > >>> <str name="inputFields">title_t,author_s</str> > >>> <str name="classField">cat_s</str> > >>> <str name="algorithm">bayes</str> > >>> </processor> > >>> </updateRequestProcessorChain> > >>> > >>> > >>> Tomas > >>> > >>> On 22 Jun 2016, at 13:46, Tomas Ramanauskas < > >>> tomas.ramanaus...@springer.com<mailto:tomas.ramanaus...@springer.com > >>>> <mailto:tomas.ramanaus...@springer.com>> > >>> wrote: > >>> > >>> P.S. The version I use: > >>> > >>> 6.1.0-68 > >>> > >>> Also, earlier I said “If I modify an existing record, I think the > >>> functionality works:”, but I think it doesn’t work for me at all. > >>> > >>> $ curl http://localhost:8983/solr/demo/get?id=book1 > >>> { > >>> "doc": > >>> { > >>> "id":"book1", > >>> "title_t":["The Way of Kings"], > >>> "author_s":"Brandon Sanderson", > >>> "cat_s":"fantasy", > >>> "pubyear_i":2010, > >>> "ISBN_s":"978-0-7653-2635-5", > >>> "_version_":1535488016326328320}} > >>> > >>> $ curl http://localhost:8983/solr/demo/update -d ' > >>> [ > >>> {"id" : "book1", > >>> "title_t":["The Way of Kings"], > >>> "author_s":"Brandon Sanderson", > >>> "cat_s":"aaa", > >>> "pubyear_i":2010, > >>> "ISBN_s":"978-0-7653-2635-5" > >>> } > >>> ]' > >>> {"responseHeader":{"status":0,"QTime":0}} > >>> > >>> $ curl http://localhost:8983/solr/demo/get?id=book1 > >>> { > >>> "doc": > >>> { > >>> "id":"book1", > >>> "title_t":["The Way of Kings"], > >>> "author_s":"Brandon Sanderson", > >>> "cat_s":"fantasy", > >>> "pubyear_i":2010, > >>> "ISBN_s":"978-0-7653-2635-5", > >>> "_version_":1535488016326328320}} > >>> > >>> > >>> Tomas > >>> > >>> > >>> On 22 Jun 2016, at 12:47, Tomas Ramanauskas < > >>> tomas.ramanaus...@springer.com<mailto:tomas.ramanaus...@springer.com > >>>> <mailto:tomas.ramanaus...@springer.com>> > >>> wrote: > >>> > >>> Hi, everyone, > >>> > >>> > >>> would someone be able to share a working example (step by step) that > >>> demonstrates the use of Naive Bayes classifier in Solr? > >>> > >>> > >>> I followed this Blog post: > >>> > >>> > >>> > https://alexbenedetti.blogspot.co.uk/2015/07/solr-document-classification-part-1.html?showComment=1464358093048#c2489902302085000947 > >>> > >>> And this tutorial: > >>> http://yonik.com/solr-tutorial/ > >>> > >>> And this JIRA ticket: > >>> https://issues.apache.org/jira/browse/SOLR-7739 > >>> > >>> > >>> > >>> So this is my configuration file (only what I added or modified): > >>> > >>> <initParams path="/update/**"> > >>> <lst name="defaults"> > >>> <str name="update.chain">classification</str> > >>> </lst> > >>> </initParams> > >>> > >>> > >>> <updateRequestProcessorChain name="classification"> > >>> <processor class="solr.ClassificationUpdateProcessorFactory"> > >>> <str name="inputFields">title_t,author_s</str> > >>> <str name="classField">cat_s</str> > >>> <str name="algorithm">bayes</str> > >>> </processor> > >>> </updateRequestProcessorChain> > >>> > >>> > >>> > >>> If I modify an existing record, I think the functionality works: > >>> > >>> > >>> $ curl http://localhost:8983/solr/demo/update -d ' > >>> [ > >>> {"id" : "book1", > >>> "title_t":["The Way of Kings"], > >>> "author_s":"Brandon Sanderson", > >>> "cat_s":"", > >>> "pubyear_i":2010, > >>> "ISBN_s":"978-0-7653-2635-5" > >>> } > >>> ]' > >>> {"responseHeader":{"status":0,"QTime":8}} > >>> $ curl http://localhost:8983/solr/demo/get?id=book1 > >>> { > >>> "doc": > >>> { > >>> "id":"book1", > >>> "title_t":["The Way of Kings"], > >>> "author_s":"Brandon Sanderson", > >>> "cat_s":"fantasy", > >>> "pubyear_i":2010, > >>> "ISBN_s":"978-0-7653-2635-5", > >>> "_version_":1535488016326328320}} > >>> > >>> > >>> > >>> > >>> If I add a new document, something isn’t quite working: > >>> > >>> $ curl http://localhost:8983/solr/demo/update -d ' > >>> [ > >>> {"id" : "book7", > >>> "title_t":["The Way of Kings"], > >>> "author_s":"Brandon Sanderson", > >>> "cat_s":"", > >>> "pubyear_i":2010, > >>> "ISBN_s":"978-0-7653-2635-5" > >>> } > >>> ]' > >>> {"responseHeader":{"status":0,"QTime":0}} > >>> $ curl http://localhost:8983/solr/demo/get?id=book7 > >>> { > >>> "doc":null} > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> -- > >>> -------------------------- > >>> > >>> Benedetti Alessandro > >>> Visiting card : http://about.me/alessandro_benedetti > >>> > >>> "Tyger, tyger burning bright > >>> In the forests of the night, > >>> What immortal hand or eye > >>> Could frame thy fearful symmetry?" > >>> > >>> William Blake - Songs of Experience -1794 England > >>> > >>> > >>> > > > > > > -- > > -------------------------- > > > > Benedetti Alessandro > > Visiting card - http://about.me/alessandro_benedetti > > Blog - http://alexbenedetti.blogspot.co.uk > > > > "Tyger, tyger burning bright > > In the forests of the night, > > What immortal hand or eye > > Could frame thy fearful symmetry?" > > > > William Blake - Songs of Experience -1794 England > > -- -------------------------- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England