Setting mm to 100% means that any misspelled word in a query means zero results. That is not a good experience. Usually, 10% of queries contain a misspelling.
Set mm to 1. The F-measure is not a good choice for this because recall is not very important in e-commerce. Use precision-oriented measures. P@3 is a good start. If there is usually exactly one correct answer (this was true when I did search at Netflix), MRR is a better choice. That measures the position of the first relevant result. https://techblog.chegg.com/2012/12/12/measuring-search-relevance-with-mrr/ wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 20, 2017, at 1:05 AM, Vincenzo D'Amore <v.dam...@gmail.com> wrote: > > Thanks for all the info, I really appreciate your help. I'm working on the > configuration and following your suggestions. > > We already had a golden set of query-results pairs (~1000) used to tune and > check how my application (and Solr configuration) performs. > But I've to entirely double check if this set is still relevant. > The results of each query are used to calculate F1. > > Nevertheless, having this base of tests le me able to try few rounds adding > and removing custom similarity, changing the tie configuration and so on > and so forth. > > Now I want share with you my results: > > - I've just set mm=100% > > - TF - set as constant 1.0 - slight improvement in search results, > basically it seems perform better when there are few products that are > almost identical, but some of them have the same keyword repeated many > times. For example a product "iphone charger for iphone 5, iphone > 5s, iphone 6" versus a product "iphone charge" > > - IDF - set as constant 1.0 - the results were not catastrophic but, for > sure, worse than having default similarity. So I've roll backed this > change, it seems to me the results are flattened too much. > > - tie - I've just tried 0.1 and 1.0, at moment 1.0 seems to perform better. > But not sure why. > > I want try to add some relevant fields (tags, categories) in order to the > have more chances to match the correct results. > > Best regards, > Vincenzo > > On Tue, Oct 17, 2017 at 11:38 PM, Walter Underwood <wun...@wunderwood.org> > wrote: > >> That page from Stanford is not about e-commerce search. Westlaw is >> professional librarian search. >> >> I agree with Emir’s advice. Start with edismax. Use a small value for the >> tie-breaker. It is one of the least important configuration values. I use >> the default from the sample configs: >> >> <str name="tie">0.1</str> >> >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >> >>> On Oct 16, 2017, at 1:53 AM, Emir Arnautović < >> emir.arnauto...@sematext.com> wrote: >>> >>> Hi Vincenzo, >>> Unless you have really specific ranking requirements, I would not >> suggest you to start with you proprietary similarity implementation. In >> most cases edismax will be good enough to cover your requirements. It is >> not easy task to tune edismax since it has a log knobs that you can use. >>> In general there are two approaches that you can use: Create a golden >> set of query-results pairs and use it with some metric (e.g. you can start >> with simple F-measure) and tune parameters to maximize metric. The >> alternative approach (complements the first one) is to let user use your >> search, track clicks and monitor search metrics like mean reciprocal rank, >> zero result queries, page depth etc. and tune queries to get better >> results. If you can do A/B testing, you can use that as well to see which >> changes are better. >>> In most cases, this is iterative process and you should not expect to >> get it right the first time and that you will be able to tune it to cover >> all cases. >>> >>> Good luck! >>> >>> HTH, >>> Emir >>> >>> -- >>> Monitoring - Log Management - Alerting - Anomaly Detection >>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ >>> >>> >>> >>>> On 16 Oct 2017, at 10:30, Vincenzo D'Amore <v.dam...@gmail.com> wrote: >>>> >>>> Hi all, >>>> >>>> I'm trying to figure out how to tune Solr for an e-commerce search. >>>> >>>> I want to share with you what I did in the hope to understand if I was >>>> right and, if there, I could also improve my configuration. >>>> >>>> I also read that the boolean model has to be preferred in this case. >>>> >>>> https://nlp.stanford.edu/IR-book/html/htmledition/the-extend >> ed-boolean-model-versus-ranked-retrieval-1.html >>>> >>>> >>>> So, I first wrote my own implementation of DefaultSimilarity returning >>>> constantly 1.0 for TF and IDF. >>>> >>>> Now I'm struggling to understand how to configure tie-break parameter, >> my >>>> opinion was to configure it to 0.1 or 0.0, thats because, if I >> understood >>>> well, in this way the boolean model should be preferred, that's because >>>> only the maximum scoring subquery contributes to final score. >>>> >>>> https://lucene.apache.org/solr/guide/6_6/the-dismax-query- >> parser.html#TheDisMaxQueryParser-Thetie_TieBreaker_Parameter >>>> >>>> >>>> Not sure if this could be enough or if you need more information, >> thanks in >>>> advance for anyone would add a bit in this discussion. >>>> >>>> Best regards, >>>> Vincenzo >>>> >> >>