Hi all,
We are doing some tests in solr 6.6 with json facet api and we get
completely wrong counts for some combination of facets
Setting: We have a set of fields for 376k documents in our query (total
120M documents). We work with 2 shards. When doing first a faceting over
the first facet and keeping these numbers, we subsequently do a nested
faceting over both facets.
Then we add the numbers of sub-facet and expect to get the
(approximately) the same numbers back. Sometimes we get rounding errors
of about 1% difference. But on other occasions it seems to way off
for example
Gender (3 values) Country (211 values)
16226 - 18424 = -2198 (-13.5461604832%)
282854 - 464387 = -181533 (-64.1790464338%)
40489 - 47902 = -7413 (-18.3086764306%)
36672 - 49749 = -13077 (-35.6593586387%)
Gender (3 values) Status (17 Values)
16226 - 16273 = -47 (-0.289658572661%)
282854 - 435974 = -153120 (-54.1339348215%)
40489 - 49925 = -9436 (-23.305095211%)
36672 - 54019 = -17347 (-47.3031195462%)
...
These are the typical requests we submit. So note that we have refine
and an overrequest, but we in the case of Gender vs Request we should
query all the buckets anyway.
{"wt":"json","rows":0,"json.facet":"{\"Status_sfhll\":\"hll(Status_sf)\",\"Status_sf\":{\"type\":\"terms\",\"field\":\"Status_sf\",\"missing\":true,\"refine\":true,\"overrequest\":50,\"limit\":50,\"offset\":0}}","q":"*:*","fq":["type:\"something\""]}
{"wt":"json","rows":0,"json.facet":"{\"Gender_sf\":{\"type\":\"terms\",\"field\":\"Gender_sf\",\"missing\":true,\"refine\":true,\"overrequest\":10,\"limit\":10,\"offset\":0,\"facet\":{\"Status_sf\":{\"type\":\"terms\",\"field\":\"Status_sf\",\"missing\":true,\"refine\":true,\"overrequest\":50,\"limit\":50,\"offset\":0}}},\"Gender_sfhll\":\"hll(Gender_sf)\"}","q":"*:*","fq":["type:\"something\""]}
Is this a known bug? Would switching to old facet api resolve this? Are
there other parameters we miss?
Thanks
kenny