Re: Query ReRanking question

Walter Underwood Sat, 06 Sep 2014 12:31:40 -0700

If you are using “bq”, that is a problem. An additive boost does not work well. 
If items are very popular, it overrides everything, if items are not so 
popular, it does nothing. You need to use “boost” in edismax, a multiplicative 
boost. That works regardless of the magnitudes.


Example from my time at Netflix several years ago:

The query is “twilight zone”. The movie “Twilight” is massively popular (in the 
queues of 1.2 million subscribers), so an additive boost puts that movie above 
all the “Twilight Zone” matches. With a multiplicative boost, that doesn’t 
happen.

The query is “lord of the rings”. There are small differences between the 
popularity of the recent movies, the extended edition movies, and the ancient 
animated version. With additive boost, those are not enough to make a 
difference. With a multiplicative boost, they do.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/


On Sep 6, 2014, at 12:07 PM, Erick Erickson <erickerick...@gmail.com> wrote:

> Ravi:
> 
> bq: It is as if the sort is applied after the docs are collected
> 
> Exactly, the primary query is getting the top 1,000 documents ranked
> by relevance. Then it's sending those through the reranking query,
> i.e. sorting them by date. I kind of question whether you really want
> 1,000 docs to be re-ranked by date, perhaps a smaller number of docs
> would provide better results, but that's for you to decide.
> 
> If I understand it correctly, conceptually reranking goes like this in
> your example
> 1> execute the first query with rows=1,000, as: q=malaysian airline
> crash&rows=1000&fl=id
> 
> 2> Now form a bit OR clause in a filter query of all the docs returned
> in <1>, like
> fq=id:(id1 OR id2 OR id45 OR.....) and append it to the reranking query, as:
> 
> q=*:*&sort=publish_date desc&fl=headline,publish_date,score&fq=id:(id1
> OR id2 OR id45 OR.....)
> 
> I'm sure Joel will correct me if I'm wrong here. And of course the
> code is much more efficient than this, but that's the idea I think.
> 
> Best,
> Erick
> 
> On Sat, Sep 6, 2014 at 11:33 AM, Ravi Solr <ravis...@gmail.com> wrote:
>> Erick,
>>        Your idea about reversing Joel's suggestion seems to give the best
>> results of all the options I tried...but I cant seem to understand why. I
>> thought the query shown below should give irrelevant results as sorting by
>> date would throw relevancy off...but somehow its getting relevant results
>> with fair enough reverse chronology. It is as if the sort is applied after
>> the docs are collected and reranked (which is what I wanted). One more
>> thing that baffled me was, if I change reRankDocs from 1000 to100 the
>> results become irrelevant, which doesnt make sense.
>> 
>> So can you kindly explain whats going on in the following query.
>> 
>> http://localhost:8080/solr/select?q=malaysian airline crash&rq={!rerank
>> reRankQuery=$rqq reRankDocs=1000}&rqq=*:*&sort=publish_date
>> desc&fl=headline,publish_date,score
>> 
>> I love the solr community, so much to learn from so many knowledgeable
>> people.
>> 
>> Thanks
>> 
>> Ravi Kiran Bhaskar
>> 
>> 
>> 
>> On Fri, Sep 5, 2014 at 1:23 PM, Erick Erickson <erickerick...@gmail.com>
>> wrote:
>> 
>>> OK, why can't you switch the clauses from Joel's suggestion?
>>> 
>>> Something like:
>>> q=Malaysia plane crash&rq={!rerank reRankDocs=1000
>>> reRankQuery=$myquery}&myquery=*:*&sort=date+desc
>>> 
>>> (haven't tried this yet, but you get the idea....).
>>> 
>>> Best,
>>> Erick
>>> 
>>> On Fri, Sep 5, 2014 at 9:33 AM, Markus Jelsma
>>> <markus.jel...@openindex.io> wrote:
>>>> Hi - You can already achieve this by boosting on the document's recency.
>>> The result set won't be exactly ordered by date but you will get the most
>>> relevant and recent documents on top.
>>>> 
>>>> Markus
>>>> 
>>>> -----Original message-----
>>>>> From:Ravi Solr <ravis...@gmail.com <mailto:ravis...@gmail.com> >
>>>>> Sent: Friday 5th September 2014 18:06
>>>>> To: solr-user@lucene.apache.org <mailto:solr-user@lucene.apache.org>
>>>>> Subject: Re: Query ReRanking question
>>>>> 
>>>>> Thank you very much for responding. I want to do exactly the opposite of
>>>>> what you said. I want to sort the relevant docs in reverse chronology.
>>> If
>>>>> you sort by date before hand then the relevancy is lost. So I want to
>>> get
>>>>> Top N relevant results and then rerank those Top N to achieve relevant
>>>>> reverse chronological results.
>>>>> 
>>>>> If you ask Why would I want to do that ??
>>>>> 
>>>>> Lets take a example about Malaysian airline crash. several articles
>>> might
>>>>> have been published over a period of time. When I search for - malaysia
>>>>> airline crash blackbox - I would want to see "relevant" results but
>>> would
>>>>> also like to see the the recent developments on the top i.e.
>>> effectively a
>>>>> reverse chronological order within the relevant results, like telling a
>>>>> story over a period of time
>>>>> 
>>>>> Hope i am clear. Thanks for your help.
>>>>> 
>>>>> Thanks
>>>>> 
>>>>> Ravi Kiran Bhaskar
>>>>> 
>>>>> 
>>>>> On Thu, Sep 4, 2014 at 5:08 PM, Joel Bernstein <joels...@gmail.com
>>> <mailto:joels...@gmail.com> > wrote:
>>>>> 
>>>>>> If you want the main query to be sorted by date then the top N docs
>>>>>> reranked by a query, that should work. Try something like this:
>>>>>> 
>>>>>> q=foo&sort=date+desc&rq={!rerank reRandDocs=1000
>>>>>> reRankQuery=$myquery}&myquery=blah
>>>>>> 
>>>>>> 
>>>>>> Joel Bernstein
>>>>>> Search Engineer at Heliosearch
>>>>>> 
>>>>>> 
>>>>>> On Thu, Sep 4, 2014 at 4:25 PM, Ravi Solr <ravis...@gmail.com
>>> <mailto:ravis...@gmail.com> > wrote:
>>>>>> 
>>>>>>> Can the ReRanking API be used to sort within docs retrieved by a
>>> date
>>>>>> field
>>>>>>> ? Can somebody help me understand how to write such a query ?
>>>>>>> 
>>>>>>> Thanks
>>>>>>> 
>>>>>>> Ravi Kiran Bhaskar
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>

Re: Query ReRanking question

Reply via email to