Re: E-Commerce Search: tf-idf, tie-break and boolean model

Walter Underwood Tue, 17 Oct 2017 14:39:08 -0700

That page from Stanford is not about e-commerce search. Westlaw is professional 
librarian search.


I agree with Emir’s advice. Start with edismax. Use a small value for the 
tie-breaker. It is one of the least important configuration values. I use the 
default from the sample configs:

       <str name="tie">0.1</str>

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Oct 16, 2017, at 1:53 AM, Emir Arnautović <emir.arnauto...@sematext.com> 
> wrote:
> 
> Hi Vincenzo,
> Unless you have really specific ranking requirements, I would not suggest you 
> to start with you proprietary similarity implementation. In most cases 
> edismax will be good enough to cover your requirements. It is not easy task 
> to tune edismax since it has a log knobs that you can use.
> In general there are two approaches that you can use: Create a golden set of 
> query-results pairs and use it with some metric (e.g. you can start with 
> simple F-measure) and tune parameters to maximize metric. The alternative 
> approach (complements the first one) is to let user use your search, track 
> clicks and monitor search metrics like mean reciprocal rank, zero result 
> queries, page depth etc. and tune queries to get better results. If you can 
> do A/B testing, you can use that as well to see which changes are better.
> In most cases, this is iterative process and you should not expect to get it 
> right the first time and that you will be able to tune it to cover all cases.
> 
> Good luck!
> 
> HTH,
> Emir
> 
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> 
> 
> 
>> On 16 Oct 2017, at 10:30, Vincenzo D'Amore <v.dam...@gmail.com> wrote:
>> 
>> Hi all,
>> 
>> I'm trying to figure out how to tune Solr for an e-commerce search.
>> 
>> I want to share with you what I did in the hope to understand if I was
>> right and, if there, I could also improve my configuration.
>> 
>> I also read that the boolean model has to be preferred in this case.
>> 
>> https://nlp.stanford.edu/IR-book/html/htmledition/the-extended-boolean-model-versus-ranked-retrieval-1.html
>> 
>> 
>> So, I first wrote my own implementation of DefaultSimilarity returning
>> constantly 1.0 for TF and IDF.
>> 
>> Now I'm struggling to understand how to configure tie-break parameter, my
>> opinion was to configure it to 0.1 or 0.0, thats because, if I understood
>> well, in this way the boolean model should be preferred, that's because
>> only the maximum scoring subquery contributes to final score.
>> 
>> https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html#TheDisMaxQueryParser-Thetie_TieBreaker_Parameter
>> 
>> 
>> Not sure if this could be enough or if you need more information, thanks in
>> advance for anyone would add a bit in this discussion.
>> 
>> Best regards,
>> Vincenzo
>> 
>> -- 
>> Vincenzo D'Amore
>> email: v.dam...@gmail.com
>> skype: free.dev
>> mobile: +39 349 8513251 <349%20851%203251>
>

Re: E-Commerce Search: tf-idf, tie-break and boolean model

Reply via email to