BTW, Lucene/Solr has never implemented a boolean model, see:

https://lucidworks.com/2011/12/28/why-not-and-or-and-not/

So if you need pure boolean you can pretty much get it if you parenthesize.

Best,
Erick



On Tue, Oct 17, 2017 at 2:06 AM, Junte Zhang <junte.zh...@localsearch.ch> wrote:
> My take on e-commerce search. Similarity matching using a vector space based 
> model, probabilistic or Boolean ranking has not so much importance as 
> compared to web search or other domains with full-text search. The reason is 
> the content. Usually very short texts, highly structured, and often not so 
> noisy. And most of the time, users are sorting by price and/or popularity 
> anyway.
>
> I think the main search challenge of effective e-commerce search is 
> semantics. I.e. query understanding (knowing what the search terms actually 
> mean and its relation to concepts/classes), synonyms and relations between 
> search terms.
>
> /JZ
>
> -----Original Message-----
> From: Charlie Hull [mailto:char...@flax.co.uk]
> Sent: Tuesday, October 17, 2017 10:10 AM
> To: solr-user@lucene.apache.org
> Subject: Re: E-Commerce Search: tf-idf, tie-break and boolean model
>
> For our e-commerce customers we've been recommending a test-based relevance 
> tuning strategy: here's a series of blogs written for us by someone who ran 
> search for the world's largest electronic component
> distributor:
> http://www.flax.co.uk/blog/2016/03/18/get-started-improving-site-search-relevancy/
> which you might find interesting.
>
> A lot of my work these days is sitting down with clients to work out how to 
> create sets of test queries and how to test them effectively. We usually 
> recommend Quepid as a tool to do this (www.quepid.com).
>
> Cheers
>
> Charlie
>
>
>
> On 16/10/2017 11:16, alessandro.benedetti wrote:
>> I was having the discussion with a colleague of mine recently, about
>> E-commerce search.
>> Of course there are tons of things you can do to improve relevancy:
>> Custom similarity - edismax tuning - basic user events processing -
>> machine learning integrations - semantic search ect ect
>>
>> more you do, better the results will potentially be, basically it is
>> an ocean to explore.
>> To avoid going off topic and being pertinent to your initial request,
>> let's take a look to the custom similarity problem.
>>
>> In e-commerce, and generally in proper nouns searches TF is not relevant.
>> IDF can help, but we need to focus on what IDF is used for in general,
>> in lucene search :
>> Mostly IDF is a measure of "how much this term is important in the
>> user query".
>> Basically Lucene ( and in general TF/IDF based Information Retrieval
>> systems
>> ) assume that more a term is rare in the corpus, more likely it is
>> that it is important for the search query.
>> That is not always true in e-commerce :
>> "iphone cover" means the user is looking for a cover, which is good
>> for his/her phone.
>> iphone is rare. Cover is not. IDF will recognize "Iphone" to be the
>> most pertinent term to the user intent.
>> There's a lot to talk in here, let's stop :)
>>
>> Anyway as a conclusion, go step by step, custom similarity + edismax
>> optimised with proper phrase and shingle boosts should be a good start.
>> Tie-breaking for e-commerce is likely to be ok, set to the default.
>> But to discover that I would recommend to set up a relevancy measuring
>> framework with golden queries and users feedback.
>>
>> cheers
>>
>>
>>
>>
>>
>> -----
>> ---------------
>> Alessandro Benedetti
>> Search Consultant, R&D Software Engineer, Director Sease Ltd. -
>> www.sease.io
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>
>> ---
>> This email has been checked for viruses by AVG.
>> http://www.avg.com
>>
>
>
> --
> Charlie Hull
> Flax - Open Source Enterprise Search
>
> tel/fax: +44 (0)8700 118334
> mobile:  +44 (0)7767 825828
> web: www.flax.co.uk

Reply via email to