Hello,

We already discussed this problem five years ago [1]. In short: documents in 
foreign languages are scored higher for some terms.

It was solved back then by using docCount instead of maxDoc when calculating 
idf, it worked really well! But, probably due to index changes, the problem is 
back for some terms, mostly proper nouns, well, just like five years ago.

We already deboost documents by 0.7 that are not in the user's preference 
language but in some cases it is not enough. I can go on by reducing that boost 
but that's not what i prefer.

I'd like to know if there are additional tricks to solve the problem.

Many thanks!
Markus

[1] 
http://lucene.472066.n3.nabble.com/Skewed-IDF-in-multi-lingual-index-td4019095.html

Reply via email to