On 10/20/06, Walter Underwood <[EMAIL PROTECTED]> wrote:
I'm trying to figure out how to set per-field boosts in Solr at index time.
The update XML syntax supports both document boosts and field boosts. http://wiki.apache.org/solr/UpdateXmlMessages A document boost is simply multiplied into the boost for each field (this is standard lucene... nothing is done differently in Solr w.r.t. boosts).
For example, if I want the title to be boosted by a factor of 8, I could do that in a query, or I could add the title text with a boost of 8 to the default text field along with the body text (with a boost of 1).
Ahhh. there's the problem. Boosts in Lucene are per document per *field*. You can't boost some tokens over others in the same field, and multi-valued fields in Lucene act as if they are catenated for indexing purposes (position gaps aside). The index-boost is currently part of the "norms", and is an eight byte float that's the product of the length normalization factor and the index-time boost. For any given indexed field, there is only one norm per document. If you look at a lucene index, these are the .f0, .f1, .f2 files (a norm array for each indexed field). Since they contain one byte per document, you can easily tell how many documents are in each segment by a simple glance at these files.
For other engines I've worked with, this gives a lot more performance at the cost of some flexibility -- you need to reindex to change the weightings.
Index time boosting only makes sense when you boost the fields of some documents over the same fields of other documents. If you *always* boost title in every document, it makes no sense to use an index-time boost... it is no faster than a query time boost, and is less flexible.
I don't see an obvious way to do this in a Solr schema, though it might make sense to add a boost attribute to copyField.
Given the current lucene restrictions, this wouldn't seem to be useful. -Yonik
