BTW, Lucene/Solr has never implemented a boolean model, see: https://lucidworks.com/2011/12/28/why-not-and-or-and-not/
So if you need pure boolean you can pretty much get it if you parenthesize. Best, Erick On Tue, Oct 17, 2017 at 2:06 AM, Junte Zhang <junte.zh...@localsearch.ch> wrote: > My take on e-commerce search. Similarity matching using a vector space based > model, probabilistic or Boolean ranking has not so much importance as > compared to web search or other domains with full-text search. The reason is > the content. Usually very short texts, highly structured, and often not so > noisy. And most of the time, users are sorting by price and/or popularity > anyway. > > I think the main search challenge of effective e-commerce search is > semantics. I.e. query understanding (knowing what the search terms actually > mean and its relation to concepts/classes), synonyms and relations between > search terms. > > /JZ > > -----Original Message----- > From: Charlie Hull [mailto:char...@flax.co.uk] > Sent: Tuesday, October 17, 2017 10:10 AM > To: solr-user@lucene.apache.org > Subject: Re: E-Commerce Search: tf-idf, tie-break and boolean model > > For our e-commerce customers we've been recommending a test-based relevance > tuning strategy: here's a series of blogs written for us by someone who ran > search for the world's largest electronic component > distributor: > http://www.flax.co.uk/blog/2016/03/18/get-started-improving-site-search-relevancy/ > which you might find interesting. > > A lot of my work these days is sitting down with clients to work out how to > create sets of test queries and how to test them effectively. We usually > recommend Quepid as a tool to do this (www.quepid.com). > > Cheers > > Charlie > > > > On 16/10/2017 11:16, alessandro.benedetti wrote: >> I was having the discussion with a colleague of mine recently, about >> E-commerce search. >> Of course there are tons of things you can do to improve relevancy: >> Custom similarity - edismax tuning - basic user events processing - >> machine learning integrations - semantic search ect ect >> >> more you do, better the results will potentially be, basically it is >> an ocean to explore. >> To avoid going off topic and being pertinent to your initial request, >> let's take a look to the custom similarity problem. >> >> In e-commerce, and generally in proper nouns searches TF is not relevant. >> IDF can help, but we need to focus on what IDF is used for in general, >> in lucene search : >> Mostly IDF is a measure of "how much this term is important in the >> user query". >> Basically Lucene ( and in general TF/IDF based Information Retrieval >> systems >> ) assume that more a term is rare in the corpus, more likely it is >> that it is important for the search query. >> That is not always true in e-commerce : >> "iphone cover" means the user is looking for a cover, which is good >> for his/her phone. >> iphone is rare. Cover is not. IDF will recognize "Iphone" to be the >> most pertinent term to the user intent. >> There's a lot to talk in here, let's stop :) >> >> Anyway as a conclusion, go step by step, custom similarity + edismax >> optimised with proper phrase and shingle boosts should be a good start. >> Tie-breaking for e-commerce is likely to be ok, set to the default. >> But to discover that I would recommend to set up a relevancy measuring >> framework with golden queries and users feedback. >> >> cheers >> >> >> >> >> >> ----- >> --------------- >> Alessandro Benedetti >> Search Consultant, R&D Software Engineer, Director Sease Ltd. - >> www.sease.io >> -- >> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >> >> --- >> This email has been checked for viruses by AVG. >> http://www.avg.com >> > > > -- > Charlie Hull > Flax - Open Source Enterprise Search > > tel/fax: +44 (0)8700 118334 > mobile: +44 (0)7767 825828 > web: www.flax.co.uk