On 1/24/07, Andrew Nagy <[EMAIL PROTECTED]> wrote:
Yonik Seeley wrote:
> Ok, here is your query:
> <str name="rawquerystring">title:(gone with the wind) OR title2:(gone
> with the wind)</str>
> And here it is parsed:
> <str name="parsedquery">(title:gone title:wind) (title2:gone
> title2:wind)</str>
>
> First, notice how stopwords were removed, so "with" and "the" will not
> count in the results.
>
> You are querying across two different fields.
> Notice how the first two documents both have "wind" in both title and
> title2,
> while the third document "gone with the wind" has no title2 field (and
> hence can't match on it).
>
> In the first documents, the scores for the matches on title and title2
> both contribute to the score. For the third document, it's penalized
> by not matching in both the title and title2 field.
>
> You could look at the dismax handler... it helps constructs queries, a
> component of which are DisjunctionMaxQueries (they don't add together
> scores from different fields, but just take the highest score from any
> matching field for a term).
>
> You could also see how changing or removing the stopword list affects
> relevance.
Wow, thanks for the verbose response. This gives me a lot to go on!
What about term ranking, could I rank the phrases searched in title
higher than title2?
Absolutely... standard lucene syntax for boosting will give you that
in the standard query handler.
title:(gone with the wind)^3.0 OR title2:(gone with the wind)
For dismax, you give the query separate from the fields, and you can
express different weights on the fields via qf=title^3.0 title2
-Yonik