> Hello , > Thanks. That clears my > doubts. Coming to the point two, Can > you please tell me which part of the Similarity takes care > of the > same. Is it possible to implement in such a way that we > give more > preference to "number of found terms".
public float coord(int overlap, int maxOverlap) method takes care: "coord(q,d) is a score factor based on how many of the query terms are found in the specified document. Typically, a document that contains more of the query's terms will receive a higher score than another document with fewer query terms. This is a search time factor computed in coord(q,d) by the Similarity in effect at search time." > Also, here in our case we need > to give more importance to "length normalisation" than the > default? Do you want to punish *more* long documents? For example you can return directly 1/numTerms or 1/(numTerms*numTerms) in this method of DefaultSimilarity: /** Implemented as <code>1/sqrt(numTerms)</code>. */ @Override public float lengthNorm(String fieldName, int numTerms) { return (float)(1.0 / Math.sqrt(numTerms)); } There will be a trade-off since there are lots of parameters. If you have two-words query which one is important for you: A short document containing one word? A long document containing two word? Or A long document containing one query term which is very rare (high idf) A short document containing one query term which is very common (low idf) Many combinations...