Re: Machine Learning for search

Joe Obernberger Wed, 23 Aug 2017 15:02:35 -0700

Thank you Joel. I'm really having a good time with the machine learningcomponent in Solr. In this case, the weather model was built byclassifying tweets as positive or negative. I started by searching fortweets with terms like tornado, storm, forecast, typhoon, hurricane,blizzard, snow, lightning, flood warning, etc.. and making thosepositive. Then I grabbed some randoms tweets about Trump, ISIS,Kardashian, etc. to make negative tweets. At that point I started toclassify data and refine the model (adding more positive/negative) asmore data came into the system.

I hope that helps. The model works very well at this point with just650 tweets manually classified (pos/neg about split even) and using 150terms.

I like your idea about using the model to re-rank the top n searchresults. That said, the results can be significantly 'better' if Iclassify more data and reorder based on high probability scores; but asyou pointed out at the cost of much slower searches. In some cases, Iwould suspect a user may want to search just with a model and withoutany search terms, but in those cases it may be best to classify data asit comes in. I guess it's a toss up between what is more important -high probability from the classifier vs high rank from the search engine.

Thanks Joel.

-Joe


On 8/23/2017 3:08 PM, Joel Bernstein wrote:

Can you describe the weather model?

In general the idea is to rerank the top N docs, because it will be too
slow to classify the whole result set.

In this scenario the search engine ranking will already be returning
relevant candidate documents and the model is only used to get a better
ordering of the top docs.



Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Aug 22, 2017 at 12:32 PM, Joe Obernberger <
joseph.obernber...@gmail.com> wrote:

Hi All - One of the really neat features of solr 6 is the ability to
create machine learning models (information gain) and then use those models
as a query.  If I want a user to be able to execute a query for the text
Hawaii and use a machine learning model related to weather data, how can I
correctly rank the results?  It looks like I would need to classify all the
documents in some date range (assuming the query is date restricted), look
at the probability_d and pick the top n documents.  Is there a better way
to do this?

I'm using a stream like this:
classify(model(models,id="WeatherModel",cacheMillis=5000),
search(COL1,df="FULL_DOCUMENT",q="Hawaii AND
DocTimestamp:[2017-07-23T04:00:00Z TO 
2017-08-23T03:59:00Z]",fl="ClusterText,id",sort="id
asc",rows="10000"),field="ClusterText")

This sends this to all the shards who can return at most 10,000 docs each.

Thanks!

-Joe


---
This email has been checked for viruses by AVG.
http://www.avg.com

Re: Machine Learning for search

Reply via email to