A few things that I see right off: 1) 2500 terms is too many. I was testing with 100-250 terms 2) 1000 iterations is to high. If the model hasn't converged by 100 iterations it's likely not going to converge. 3) You're going to need more examples. You may want to run features first and see what it selects. Then you need multiple examples for each feature. I was testing with the enron ham/spam data set. It would be good to download that dataset and see what that looks like.
Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Feb 9, 2017 at 10:15 AM, Susheel Kumar <susheel2...@gmail.com> wrote: > Hello Joel, > > Here is the final iteration in json format. > > https://www.dropbox.com/s/g3a3606ms6cu8q4/final_iteration.json?dl=0 > > Below is the expression used > > update(models, > batchSize="50", > train(trainingSet, > features(trainingSet, > q="*:*", > featureSet="threatFeatures", > field="body_txt", > outcome="out_i", > numTerms=2500), > q="*:*", > name="threatModel", > field="body_txt", > outcome="out_i", > maxIterations="1000")) > > I just have 16 documents with 8+ve and 8-ves. The field which contains the > feedback is body_txt (text_general type) > > Thanks for looking. > > > > On Wed, Feb 8, 2017 at 7:52 AM, Joel Bernstein <joels...@gmail.com> wrote: > > > Can you post the final iteration of the model? > > > > Also the expression you used to train the model? > > > > How much training data do you have? Ho many positive examples and > negatives > > examples? > > > > Joel Bernstein > > http://joelsolr.blogspot.com/ > > > > On Tue, Feb 7, 2017 at 2:14 PM, Susheel Kumar <susheel2...@gmail.com> > > wrote: > > > > > Hello, > > > > > > I am tried to follow http://joelsolr.blogspot.com/ to see if we can > > > classify positive & negative feedbacks using streaming expressions. > All > > > works but end result where probability_d result of classify expression > > > gives similar results for positive / negative feedback. See below > > > > > > What I may be missing here. Do i need to put more data in training set > > or > > > something else? > > > > > > > > > { "result-set": { "docs": [ { "body_txt": [ "love the company" ], > > > "score_d": 2.1892474120319667, "id": "6", "probability_d": > > > 0.977944433135261 }, { "body_txt": [ "bad experience " ], "score_d": > > > 3.1689453250842914, "id": "5", "probability_d": 0.9888109278133054 }, { > > > "body_txt": [ "This company rewards its employees, but you should only > > work > > > here if you truly love sales. The stress of the job can get to you and > > they > > > definitely push you." ], "score_d": 4.621702323888672, "id": "4", > > > "probability_d": 0.9999999999898557 }, { "body_txt": [ "no chance for > > > advancement with that company every year I was there it got worse I > don't > > > know if all branches of adp but Florence organization was turn over > rate > > > would be higher if it was for temp workers" ], "score_d": > > > 5.288898825826228, "id": "3", "probability_d": 0.9999999999999956 }, { > > > "body_txt": [ "It was a pleasure to work at the Milpitas campus. The > team > > > that works there are professional and dedicated individuals. The level > of > > > loyalty and dedication is impressive" ], "score_d": 2.5303947056922937, > > > "id": "2", "probability_d": 0.9999990430778418 }, > > > > > >