Re: Boosting relevance as terms get nearer to each other

Michael Mon, 17 Aug 2009 08:31:10 -0700

Great, thank you Mark!
Michael

On Mon, Aug 17, 2009 at 10:48 AM, Mark Miller <markrmil...@gmail.com> wrote:


> PhraseQuery's do score higher if the terms are found closer together.
>
>  does that imply that during the computation of the score for "a b
>>> c"~1000000, sloppyFreq() will be called?
>>>
>>
> Yes. PhraseQuery uses PhraseWeight, which creates a SloppyPhraseScorer,
> which takes into account Similiarity.sloppyFreq(matchLength).
>
>
>
> Michael wrote:
>
>> Thanks for the suggestion.  Unfortunately, my implementation requires the
>> Standard query parser -- I sanitize and expand user queries into deeply
>> nested queries with custom boosts and other bells and whistles that make
>> Dismax unappealing.
>> I see from the docs that Similarity.sloppyFreq() is a method for returning
>> a
>> higher score for small edit distances, but it's not clear when that is
>> used.
>>  If I make a (Standard) query like
>>  a AND b AND c AND "a b c"~1000000
>> does that imply that during the computation of the score for "a b
>> c"~1000000, sloppyFreq() will be called?  That's great for my needs,
>> assuming the 1000000 slop doesn't increase query time horribly.
>>
>> Michael
>>
>> On Mon, Aug 17, 2009 at 10:15 AM, Mark Miller <markrmil...@gmail.com>
>> wrote:
>>
>>
>>
>>> Dismax QueryParser with pf and ps params?
>>>
>>> http://wiki.apache.org/solr/DisMaxRequestHandler
>>>
>>> --
>>> - Mark
>>>
>>> http://www.lucidimagination.com
>>>
>>>
>>>
>>>
>>> Michael wrote:
>>>
>>>
>>>
>>>> Anybody have any suggestions or hints?  I'd love to score my queries in
>>>> a
>>>> way that pays attention to how close together terms appear.
>>>> Michael
>>>>
>>>> On Thu, Aug 13, 2009 at 12:01 PM, Michael <solrco...@gmail.com> wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> Hello,
>>>>> I'd like to score documents higher that have the user's search terms
>>>>> nearer
>>>>> each other.  For example, if a user searches for
>>>>>
>>>>>  a AND b AND c
>>>>>
>>>>> the standard query handler should return all documents with [a] [b] and
>>>>> [c]
>>>>> in them, but documents matching the phrase "a b c" should get a boost
>>>>> over
>>>>> those with "a x b c" over those with "b x y c z a", etc.
>>>>>
>>>>> To accomplish this, I thought I might replace the user's query with
>>>>>
>>>>>  "a b c"~1000000000
>>>>>
>>>>> hoping that the slop term gets a higher and higher score the closer
>>>>> together [a] [b] and [c] appear.  This doesn't seem to be the case in
>>>>> my
>>>>> experiments; when I debug the query, there's no component of the score
>>>>> based
>>>>> on how close together [a] [b] and [c] are.  And I'm suspicious that
>>>>> this
>>>>> would make my queries a whole lot slower -- in reality my users'
>>>>> queries
>>>>> get
>>>>> expanded quite a bit already, and I'd thus need to add many slop terms.
>>>>>
>>>>> Perhaps instead I could modify the Standard query handler to examine
>>>>> the
>>>>> distance between all ANDed tokens, and boost proportionally to the
>>>>> inverse
>>>>> of their average distance apart.  I've never modified a query handler
>>>>> before
>>>>> so I have no idea if this is possible.
>>>>>
>>>>> Any suggestions on what approach I should take?  The less I have to
>>>>> modify
>>>>> Solr, the better -- I'd prefer a query-side solution over writing a
>>>>> plugin
>>>>> over forking the standard query handler.
>>>>>
>>>>> Thanks in advance!
>>>>> Michael
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>

Re: Boosting relevance as terms get nearer to each other

Reply via email to