Re: alerting system with Solr's Streaming Expressions

Susheel Kumar Tue, 14 Feb 2017 11:44:18 -0800

Hello Joel,

I took a bigger trainingSet around 200K documents (amazon reviews) and it
worked out well.  I verified the feature terms extracted and classify
function was able to output correct probability of reviews being negative
or positive.  Big thanks for adding this.


I wonder what you have next to implement more towards NLU in Solr where
queries like "average revenue in last quarter" etc. can be converted to
streaming functions to return appropriate results.

Thanks,
Susheel


On Thu, Feb 9, 2017 at 11:23 AM, Susheel Kumar <susheel2...@gmail.com>
wrote:

> got it, Thanks, Joel.
>
> On Thu, Feb 9, 2017 at 11:17 AM, Susheel Kumar <susheel2...@gmail.com>
> wrote:
>
>> I increased from 250 to 2500 and 100 to 1000 when did't get expected
>> result.  Let me put more examples.
>>
>> Thanks,
>> Susheel
>>
>> On Thu, Feb 9, 2017 at 11:03 AM, Joel Bernstein <joels...@gmail.com>
>> wrote:
>>
>>> A few things that I see right off:
>>>
>>> 1) 2500 terms is too many. I was testing with 100-250 terms
>>> 2) 1000 iterations is to high. If the model hasn't converged by 100
>>> iterations it's likely not going to converge.
>>> 3) You're going to need more examples. You may want to run features first
>>> and see what it selects. Then you need multiple examples for each
>>> feature.
>>> I was testing with the enron ham/spam data set. It would be good to
>>> download that dataset and see what that looks like.
>>>
>>> Joel Bernstein
>>> http://joelsolr.blogspot.com/
>>>
>>> On Thu, Feb 9, 2017 at 10:15 AM, Susheel Kumar <susheel2...@gmail.com>
>>> wrote:
>>>
>>> > Hello Joel,
>>> >
>>> > Here is the final iteration in json format.
>>> >
>>> >  https://www.dropbox.com/s/g3a3606ms6cu8q4/final_iteration.json?dl=0
>>> >
>>> > Below is the expression used
>>> >
>>> > update(models,
>>> >              batchSize="50",
>>> >              train(trainingSet,
>>> >                       features(trainingSet,
>>> >                                      q="*:*",
>>> >                                      featureSet="threatFeatures",
>>> >                                      field="body_txt",
>>> >                                      outcome="out_i",
>>> >                                      numTerms=2500),
>>> >                       q="*:*",
>>> >                       name="threatModel",
>>> >                       field="body_txt",
>>> >                       outcome="out_i",
>>> >                       maxIterations="1000"))
>>> >
>>> > I just have 16 documents with 8+ve and 8-ves. The field which contains
>>> the
>>> > feedback is body_txt (text_general type)
>>> >
>>> > Thanks for looking.
>>> >
>>> >
>>> >
>>> > On Wed, Feb 8, 2017 at 7:52 AM, Joel Bernstein <joels...@gmail.com>
>>> wrote:
>>> >
>>> > > Can you post the final iteration of the model?
>>> > >
>>> > > Also the expression you used to train the model?
>>> > >
>>> > > How much training data do you have? Ho many positive examples and
>>> > negatives
>>> > > examples?
>>> > >
>>> > > Joel Bernstein
>>> > > http://joelsolr.blogspot.com/
>>> > >
>>> > > On Tue, Feb 7, 2017 at 2:14 PM, Susheel Kumar <susheel2...@gmail.com
>>> >
>>> > > wrote:
>>> > >
>>> > > > Hello,
>>> > > >
>>> > > > I am tried to follow http://joelsolr.blogspot.com/ to see if we
>>> can
>>> > > > classify positive & negative feedbacks using streaming expressions.
>>> > All
>>> > > > works but end result where probability_d result of classify
>>> expression
>>> > > > gives similar results for positive / negative feedback. See below
>>> > > >
>>> > > > What I may be missing here.  Do i need to put more data in
>>> training set
>>> > > or
>>> > > > something else?
>>> > > >
>>> > > >
>>> > > > { "result-set": { "docs": [ { "body_txt": [ "love the company" ],
>>> > > > "score_d": 2.1892474120319667, "id": "6", "probability_d":
>>> > > > 0.977944433135261 }, { "body_txt": [ "bad experience " ],
>>> "score_d":
>>> > > > 3.1689453250842914, "id": "5", "probability_d": 0.9888109278133054
>>> }, {
>>> > > > "body_txt": [ "This company rewards its employees, but you should
>>> only
>>> > > work
>>> > > > here if you truly love sales. The stress of the job can get to you
>>> and
>>> > > they
>>> > > > definitely push you." ], "score_d": 4.621702323888672, "id": "4",
>>> > > > "probability_d": 0.9999999999898557 }, { "body_txt": [ "no chance
>>> for
>>> > > > advancement with that company every year I was there it got worse I
>>> > don't
>>> > > > know if all branches of adp but Florence organization was turn over
>>> > rate
>>> > > > would be higher if it was for temp workers" ], "score_d":
>>> > > > 5.288898825826228, "id": "3", "probability_d": 0.9999999999999956
>>> }, {
>>> > > > "body_txt": [ "It was a pleasure to work at the Milpitas campus.
>>> The
>>> > team
>>> > > > that works there are professional and dedicated individuals. The
>>> level
>>> > of
>>> > > > loyalty and dedication is impressive" ], "score_d":
>>> 2.5303947056922937,
>>> > > > "id": "2", "probability_d": 0.9999990430778418 },
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>

Re: alerting system with Solr's Streaming Expressions

Reply via email to