Kenny,

This is a known behavior in multi-sharded collection where the field values
belonging to same facet doesn't reside in same shard. Yonik Seeley has
improved the Json Facet feature by introducing "overrequest" and "refine"
parameters.

Kindly checkout Jira:
https://issues.apache.org/jira/browse/SOLR-7452
https://issues.apache.org/jira/browse/SOLR-9432

Relevant blog: https://medium.com/@abb67cbb46b/1acfa77cd90c

On 10 Nov 2017 10:02 p.m., "kenny" <ke...@ontoforce.com> wrote:

> Hi all,
>
> We are doing some tests in solr 6.6 with json facet api and we get
> completely wrong counts for some combination of  facets
>
> Setting: We have a set of fields for 376k documents in our query (total
> 120M documents). We work with 2 shards. When doing first a faceting over
> the first facet and keeping these numbers, we subsequently do a nested
> faceting over both facets.
>
> Then we add the numbers of sub-facet and expect to get the (approximately)
> the same numbers back. Sometimes we get rounding errors of about 1%
> difference. But on other occasions it seems to way off
>
> for example
>
> Gender (3 values) Country (211 values)
> 16226 - 18424 = -2198 (-13.5461604832%)
> 282854 - 464387 = -181533 (-64.1790464338%)
> 40489 - 47902 = -7413 (-18.3086764306%)
> 36672 - 49749 = -13077 (-35.6593586387%)
>
> Gender (3 values)  Status (17 Values)
> 16226 - 16273 = -47 (-0.289658572661%)
> 282854 - 435974 = -153120 (-54.1339348215%)
> 40489 - 49925 = -9436 (-23.305095211%)
> 36672 - 54019 = -17347 (-47.3031195462%)
>
> ...
>
> These are the typical requests we submit. So note that we have refine and
> an overrequest, but we in the case of Gender vs Request we should query all
> the buckets anyway.
>
> {"wt":"json","rows":0,"json.facet":"{\"Status_sfhll\":\"hll(
> Status_sf)\",\"Status_sf\":{\"type\":\"terms\",\"field\":\"S
> tatus_sf\",\"missing\":true,\"refine\":true,\"overrequest\":
> 50,\"limit\":50,\"offset\":0}}","q":"*:*","fq":["type:\"something\""]}
>
> {"wt":"json","rows":0,"json.facet":"{\"Gender_sf\":{\"type\"
> :\"terms\",\"field\":\"Gender_sf\",\"missing\":true,\"refine
> \":true,\"overrequest\":10,\"limit\":10,\"offset\":0,\"
> facet\":{\"Status_sf\":{\"type\":\"terms\",\"field\":\"Statu
> s_sf\",\"missing\":true,\"refine\":true,\"overrequest\":50,\
> "limit\":50,\"offset\":0}}},\"Gender_sfhll\":\"hll(Gender_
> sf)\"}","q":"*:*","fq":["type:\"something\""]}
>
> Is this a known bug? Would switching to old facet api resolve this? Are
> there other parameters we miss?
>
>
> Thanks
>
>
> kenny
>
>
>

Reply via email to