Hi all,

We are doing some tests in solr 6.6 with json facet api and we get completely wrong counts for some combination of  facets

Setting: We have a set of fields for 376k documents in our query (total 120M documents). We work with 2 shards. When doing first a faceting over the first facet and keeping these numbers, we subsequently do a nested faceting over both facets.

Then we add the numbers of sub-facet and expect to get the (approximately) the same numbers back. Sometimes we get rounding errors of about 1% difference. But on other occasions it seems to way off

for example

Gender (3 values) Country (211 values)
16226 - 18424 = -2198 (-13.5461604832%)
282854 - 464387 = -181533 (-64.1790464338%)
40489 - 47902 = -7413 (-18.3086764306%)
36672 - 49749 = -13077 (-35.6593586387%)

Gender (3 values)  Status (17 Values)
16226 - 16273 = -47 (-0.289658572661%)
282854 - 435974 = -153120 (-54.1339348215%)
40489 - 49925 = -9436 (-23.305095211%)
36672 - 54019 = -17347 (-47.3031195462%)

...

These are the typical requests we submit. So note that we have refine and an overrequest, but we in the case of Gender vs Request we should query all the buckets anyway.

{"wt":"json","rows":0,"json.facet":"{\"Status_sfhll\":\"hll(Status_sf)\",\"Status_sf\":{\"type\":\"terms\",\"field\":\"Status_sf\",\"missing\":true,\"refine\":true,\"overrequest\":50,\"limit\":50,\"offset\":0}}","q":"*:*","fq":["type:\"something\""]}

{"wt":"json","rows":0,"json.facet":"{\"Gender_sf\":{\"type\":\"terms\",\"field\":\"Gender_sf\",\"missing\":true,\"refine\":true,\"overrequest\":10,\"limit\":10,\"offset\":0,\"facet\":{\"Status_sf\":{\"type\":\"terms\",\"field\":\"Status_sf\",\"missing\":true,\"refine\":true,\"overrequest\":50,\"limit\":50,\"offset\":0}}},\"Gender_sfhll\":\"hll(Gender_sf)\"}","q":"*:*","fq":["type:\"something\""]}

Is this a known bug? Would switching to old facet api resolve this? Are there other parameters we miss?


Thanks


kenny


Reply via email to