Thanks Yonik and thanks Doug. I agree with Doug in adding few generics test corpora Jenkins automatically runs some metrics on, to evaluate Apache Lucene/Solr changes don't affect a golden truth too much. This of course can be very complex, but I think it is a direction the Apache Lucene/Solr community should work on.
Given that, I do believe that in this case, moving from maxDocs(field independent) to docCount(field dependent) was a good move ( and this specific multi language use case is an example). Actually I also believe that theoretically docCount(field dependent) is still better than maxDocs(field dependent). This is because docCount(field dependent) represents a state in time associated to the current index while maxDocs represents an historical consideration. A corpus of documents can change in time, and how much a term is rare can drastically change ( let's pick an highly dynamic domain such news). Doug, were you able to generalise and abstract any consideration from what happened to your customers and why they got regressions moving from maxDocs to docCount(field dependent) ? ----- --------------- Alessandro Benedetti Search Consultant, R&D Software Engineer, Director Sease Ltd. - www.sease.io -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html