Re: Index-time Boosting

Tracey Jaquith Tue, 05 Dec 2006 17:52:36 -0800

ahh, after rereading this about 20 times today 8-)
i think i finally "get it" (your final question below).


if i do index-time boosts, and search only "text" (default field)
the boosts will propogate into "text", but only insofar that the
document will weight higher when a phrase is found in the "text"
field (regardless of whether that "hit" really was due to something
copyField-ed in with boost 1, boost 100, etc.)

so that solution would have the effect of making certain documents
have higher scores in the "text" field, not the effect we'd like.

[example documentA]
 [description] i like to commute
  [title] commuting thoughts
copyField text to:
 [text] i like to commute commuting thoughts

we, the Archive, want query hits in title to boost ^100.
if we do q=commute (which searches "text")
with index-time boosting, solr/lucene won't know
the hit due to "title" should effect a much higher ranking
compared to documents with commute in "text" but
not in "title".   however, the above document *will* have a higher
score, in general, because the "title" portion was nearly
half of the "text" field.  Yet A will have a
higher ranking even for matches like "q=like"
compared to documentB like:
 [description] i like bread
 [text] i like bread
(when in reality, we'd like them to have near equal weighting).
So index boosts won't due for us.  I'm learning!

--tracey

 the std handler to see the ordering of the results change for
"fieldless queries"
 (eg: "q=tracey+pooh").  I have 33 fields using <copyField dest="text"
source="..."/>
  (where "text" is our default field to query)
 to allow for checking across most of our std XML fields.  I gather that
a boost

applied to "title" on indexing a docuement must somehow "propogate"to the

  "text" field?

I've tried some experiments, adjusting the boosts at index time andrunning


Background: for an indexed field name there is a single boost value
per document.  This is true even if the field is multi-valued... all
values for that document "share" the same boost.  This is a Lucene
restriction so we can't fix it in Solr in any way.

Solr *does* propagate the index-time boost when doing copyField, but
this just ends up being multiplied into all the other boosts for
values for that document.   Matches on the resulting text field will
*always* score higher, regardless of which "part" matched.  Does that
make sense?

*ith - http://www.archive.org/~tracey <http://www.archive.org/%7Etracey> --*

Re: Index-time Boosting

Reply via email to