Hello, We already discussed this problem five years ago [1]. In short: documents in foreign languages are scored higher for some terms.
It was solved back then by using docCount instead of maxDoc when calculating idf, it worked really well! But, probably due to index changes, the problem is back for some terms, mostly proper nouns, well, just like five years ago. We already deboost documents by 0.7 that are not in the user's preference language but in some cases it is not enough. I can go on by reducing that boost but that's not what i prefer. I'd like to know if there are additional tricks to solve the problem. Many thanks! Markus [1] http://lucene.472066.n3.nabble.com/Skewed-IDF-in-multi-lingual-index-td4019095.html