Re: regarding ranking

Ahmet Arslan Tue, 16 Feb 2010 09:00:59 -0800

> Hello ,
>           Thanks. That clears my
> doubts. Coming to the point two, Can
> you please tell me which part of the Similarity takes care
> of the
> same. Is it possible to implement in such a way that we
> give more
> preference to "number of found terms".


public float coord(int overlap, int maxOverlap) method takes care:

"coord(q,d) is a score factor based on how many of the query terms are found in 
the specified document. Typically, a document that contains more of the query's 
terms will receive a higher score than another document with fewer query terms. 
This is a search time factor computed in coord(q,d) by the Similarity in effect 
at search time." 

> Also, here in our case  we need
> to give more importance to "length normalisation" than the
> default? 

Do you want to punish *more* long documents? 
For example you can return directly 1/numTerms or 1/(numTerms*numTerms) in this 
method of DefaultSimilarity:

/** Implemented as <code>1/sqrt(numTerms)</code>. */
  @Override
  public float lengthNorm(String fieldName, int numTerms) {
    return (float)(1.0 / Math.sqrt(numTerms));
  }

There will be a trade-off since there are lots of parameters. 
If you have two-words query which one is important for you:
A short document containing one word?
A long document containing two word?
Or
A long document containing one query term which is very rare (high idf)
A short document containing one query term which is very common (low idf)

Many combinations...

Re: regarding ranking

Reply via email to