Hi, I'm having a weirdness with indexing multiple terms to a single field using a copyField. An example:
For document A field:contents_1 is a multivalued field containing "cat", "dog" and "duck" field:contents_2 is a multivalued field containing "cat", "horse", and "flower" For document B field:contents_1 is a multivalued field containing "cat" and "fish" field:contents_2 is a multivalued field containing "bear" and "turkey" I have a copyField in my schema: <copyField source="contents_*" dest="combined"/> A query like contents_1:cat contents_2:cat returns document A first, and then document B. I think that is the way it should work. But a query like combined:cat returns document B first. In my mind, when I am doing a copyField I am copying each of the terms in the multivalued fields of contents_1 and contents_2 into combined, so that combined internally has "cat", "dog", "duck", "cat", "horse", "flower" for document A. An explain on the query says something like (this is from a real query not the fake one above) <lst name="explain"> <str name="B"> 4.0687284 = (MATCH) fieldWeight(combined:cat in 1663089), product of: 1.0 = tf(termFreq(combined:cat)=1) 4.0687284 = idf(docFreq=135688, maxDocs=2919285) 1.0 = fieldNorm(field=combined, doc=1663089) </str> <str name="A"> 0.8509077 = (MATCH) fieldWeight(combined:cat in 913171), product of: 2.236068 = tf(termFreq(combined:cat)=5) 4.0590663 = idf(docFreq=143689, maxDocs=3061697) 0.09375 = fieldNorm(field=combined, doc=913171) </str> If I am reading this right, it is finding the higher TF in A (5 in this case) but still scoring B higher. Shouldn't idf be exactly the same? (Both fields are a solr.TextField: <fieldtype name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StandardFilterFactory"/> <filter class="solr.ISOLatin1AccentFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/> <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/> </analyzer> </fieldtype> ) Another piece of perhaps relevant information is that this a query over 16 shards using distributed solr.