On 1/23/07, Andrew Nagy <[EMAIL PROTECTED]> wrote:
I have 2 questions about the SOLR relevancy system.

As far as scoring, it's pretty much stock lucene with some other stuff
added on (like function query).
http://lucene.apache.org/java/docs/scoring.html

1. Why is it when I search for an exact phrase of a title of a record I
have it generally does not come up as the 1st record in the results?

ex: title:(gone with the wind), the record comes up 3rd.  A record with
the term "wind" as the first word in the title comes up 1st.
ex: title:"gone with the wind", the record comes up 1st.

Well, you could do an exact or sloppy phrase match
title:"gone with the wind"
But I get your point... if you want to also match records with just "wind".

Is this because the word "wind" is the only noun?

Yes, this probably came about because of lucene's length normalization
in the default similarity.  It's 1/sqrt(num_terms_in_field)

So a document with a title of "wind" has a "norm" of 1.0, while a
document with 4 terms has a "norm" of .7
Still, it seems like the coord factor (number of terms matching)
should have been more than enough to overcome the length
normalization.  What were the exact titles?  I assume you were not
using any type if index-time boosting?

Things you can try:
- post the debugging output (including score explain) for the query
- try disabling length normalization for the title field, then remove
the entire index and re-idnex.
- try the dismax handler, which can generate sloppy phrase queries to
boost results containing all terms.
- try a different similarity implementation
(org.apache.lucene.misc.SweetSpotSimilarity from lucene)


2. The "score" that is associated with each value is quite odd, what
does it represent.  I generally get results with the top record being
somewhere around 3.0 or 2.0 and most records are below 1.

Scores aren't too comparable across different queries... the scores
are only meant to rank documents with respect to a single query.

-Yonik

Reply via email to