PhraseQuery's do score higher if the terms are found closer together.

does that imply that during the computation of the score for "a b c"~1000000, 
sloppyFreq() will be called?

Yes. PhraseQuery uses PhraseWeight, which creates a SloppyPhraseScorer, which 
takes into account Similiarity.sloppyFreq(matchLength).


Michael wrote:
Thanks for the suggestion.  Unfortunately, my implementation requires the
Standard query parser -- I sanitize and expand user queries into deeply
nested queries with custom boosts and other bells and whistles that make
Dismax unappealing.
I see from the docs that Similarity.sloppyFreq() is a method for returning a
higher score for small edit distances, but it's not clear when that is used.
 If I make a (Standard) query like
  a AND b AND c AND "a b c"~1000000
does that imply that during the computation of the score for "a b
c"~1000000, sloppyFreq() will be called?  That's great for my needs,
assuming the 1000000 slop doesn't increase query time horribly.

Michael

On Mon, Aug 17, 2009 at 10:15 AM, Mark Miller <markrmil...@gmail.com> wrote:

Dismax QueryParser with pf and ps params?

http://wiki.apache.org/solr/DisMaxRequestHandler

--
- Mark

http://www.lucidimagination.com




Michael wrote:

Anybody have any suggestions or hints?  I'd love to score my queries in a
way that pays attention to how close together terms appear.
Michael

On Thu, Aug 13, 2009 at 12:01 PM, Michael <solrco...@gmail.com> wrote:



Hello,
I'd like to score documents higher that have the user's search terms
nearer
each other.  For example, if a user searches for

 a AND b AND c

the standard query handler should return all documents with [a] [b] and
[c]
in them, but documents matching the phrase "a b c" should get a boost
over
those with "a x b c" over those with "b x y c z a", etc.

To accomplish this, I thought I might replace the user's query with

 "a b c"~1000000000

hoping that the slop term gets a higher and higher score the closer
together [a] [b] and [c] appear.  This doesn't seem to be the case in my
experiments; when I debug the query, there's no component of the score
based
on how close together [a] [b] and [c] are.  And I'm suspicious that this
would make my queries a whole lot slower -- in reality my users' queries
get
expanded quite a bit already, and I'd thus need to add many slop terms.

Perhaps instead I could modify the Standard query handler to examine the
distance between all ANDed tokens, and boost proportionally to the
inverse
of their average distance apart.  I've never modified a query handler
before
so I have no idea if this is possible.

Any suggestions on what approach I should take?  The less I have to
modify
Solr, the better -- I'd prefer a query-side solution over writing a
plugin
over forking the standard query handler.

Thanks in advance!
Michael










--
- Mark

http://www.lucidimagination.com



Reply via email to