Hi! I have a SolrCloud 6.6 collection with 3 shards setup where I need the TermVectors TF and DF values when querying.
I have configured the ExactStatsCache in the solrConfig: <statsCache class="org.apache.solr.search.stats.ExactStatsCache"/> When I query "detector works" in my collection, it returns different docfreq values based on the shard the document comes from: "termVectors":[ "27504103",[ "uniqueKey","27504103", "kc",[ "detector works",[ "tf",1, "df",3, "tf-idf",0.3333333333333333]]], "27507925",[ "uniqueKey","27507925", "kc",[ "detector works",[ "tf",1, "df",3, "tf-idf",0.3333333333333333]]], "27504105",[ "uniqueKey","27504105", "kc",[ "detector works",[ "tf",1, "df",2, "tf-idf",0.5]]], "27507927",[ "uniqueKey","27507927", "kc",[ "detector works",[ "tf",1, "df",2, "tf-idf",0.5]]], "27507929",[ "uniqueKey","27507929", "kc",[ "detector works",[ "tf",1, "df",1, "tf-idf",1.0]]], "27504107",[ "uniqueKey","27504107", "kc",[ "detector works",[ "tf",1, "df",3, "tf-idf",0.3333333333333333]]]]} I expect to see the DF values to be 6 and TF-IDF to be adjusted on that value. I can see in the debug logs that the cache was active. I have found a pending bug (since Solr 5.5: https://issues.apache.org/jira/browse/SOLR-8893) that explains that this ExactStatsCache is used to compute the correct TF-IDF for the query but not for the TermVectors component. Is there any way to get the correctly merged DF values (and TF-IDF) from multiple shards? Is there a way to get from which shard a document comes from so I could compute my own correct DF? Thank you, Patrick