On 10/20/06, Mike Klaas <[EMAIL PROTECTED]> wrote:
Index-time boosts can be set per-document or per-document-field. There is no facility for setting the boost of a part of text added to a field (as you suggest above) (which is really a shame, as such functionality would lend huge flexibility to index-time boosing!).
I wonder what the index-size cost would be though... Anyway, there has been discussion of flexible indexing on the Lucene list in the past few months, with one application being boost-per-position.
You must do this for every document. (Be careful for multi-valued fields--you should only set the boost for _one_ value input to the field).
Good point... I believe they are all multiplied together in Lucene.
There are a few optimizations in solr that only trigger when boosts are one, but I'm not sure exactly what those are.
There were optimizations that hoisted mandatory boolean clauses with a zero boost into a cached filter (I got that optimization from Doug/Nutch). That optimization is no longer in the normal code paths that return DocSets/DocLists, and it probably doesn't matter given that one can now explicitly specify filter queries themselves via fq params. Is fq documented anywhere??? It's very useful for speeding up complex queries since they are cached independently from the main query. Just yesterday I sped up some queries from an average latency of .550 seconds to .004 seconds by pulling out some mandatory clauses that matched the majority of documents in the index into a fq.
Finally, it can be much faster to search a single field rather than multiple fields. One hacky way of achieving this is to make a field which receives a single copy of contents and eight copies of title. This is imperfect, as it messes up length normalization and summarizing.
And you can't make the title field count 8 times as much :-) I've seen people simply *add* the title field multiple times to the general search field in an attempt to boost it. I can't say how well it worked. -Yonik