On 1/23/07, Andrew Nagy <[EMAIL PROTECTED]> wrote:
I have 2 questions about the SOLR relevancy system.
As far as scoring, it's pretty much stock lucene with some other stuff added on (like function query). http://lucene.apache.org/java/docs/scoring.html
1. Why is it when I search for an exact phrase of a title of a record I have it generally does not come up as the 1st record in the results? ex: title:(gone with the wind), the record comes up 3rd. A record with the term "wind" as the first word in the title comes up 1st. ex: title:"gone with the wind", the record comes up 1st.
Well, you could do an exact or sloppy phrase match title:"gone with the wind" But I get your point... if you want to also match records with just "wind".
Is this because the word "wind" is the only noun?
Yes, this probably came about because of lucene's length normalization in the default similarity. It's 1/sqrt(num_terms_in_field) So a document with a title of "wind" has a "norm" of 1.0, while a document with 4 terms has a "norm" of .7 Still, it seems like the coord factor (number of terms matching) should have been more than enough to overcome the length normalization. What were the exact titles? I assume you were not using any type if index-time boosting? Things you can try: - post the debugging output (including score explain) for the query - try disabling length normalization for the title field, then remove the entire index and re-idnex. - try the dismax handler, which can generate sloppy phrase queries to boost results containing all terms. - try a different similarity implementation (org.apache.lucene.misc.SweetSpotSimilarity from lucene)
2. The "score" that is associated with each value is quite odd, what does it represent. I generally get results with the top record being somewhere around 3.0 or 2.0 and most records are below 1.
Scores aren't too comparable across different queries... the scores are only meant to rank documents with respect to a single query. -Yonik