On 10/20/06, Walter Underwood <[EMAIL PROTECTED]> wrote:
I'm trying to figure out how to set per-field boosts in Solr at index time.

The update XML syntax supports both document boosts and field boosts.
http://wiki.apache.org/solr/UpdateXmlMessages
A document boost is simply multiplied into the boost for each field
(this is standard lucene... nothing is done differently in Solr w.r.t.
boosts).

For example, if I want the title to be boosted by a factor of 8, I could
do that in a query, or I could add the title text with a boost of 8 to the
default text field along with the body text (with a boost of 1).

Ahhh. there's the problem.  Boosts in Lucene are per document per
*field*.  You can't boost some tokens over others in the same field,
and multi-valued fields in Lucene act as if they are catenated for
indexing purposes (position gaps aside).

The index-boost is currently part of the "norms", and is an eight byte
float that's the product of the length normalization factor and the
index-time boost.  For any given indexed field, there is only one norm
per document.  If you look at a lucene index, these are the .f0, .f1,
.f2 files (a norm array for each indexed field).  Since they contain
one byte per document, you can easily tell how many documents are in
each segment by a simple glance at these files.

For other engines I've worked with, this gives a lot more performance at
the cost of some flexibility -- you need to reindex to change the
weightings.

Index time boosting only makes sense when you boost the fields of some
documents over the same fields of other documents.  If you *always*
boost title in every document, it makes no sense to use an index-time
boost... it is no faster than a query time boost, and is less
flexible.

I don't see an obvious way to do this in a Solr schema, though it might
make sense to add a boost attribute to copyField.

Given the current lucene restrictions, this wouldn't seem to be useful.

-Yonik

Reply via email to