[
https://issues.apache.org/jira/browse/SOLR-15220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17296185#comment-17296185
]
Tim Owen commented on SOLR-15220:
---------------------------------
Example in a non-distributed search
{noformat}
{ facet: { authors: { type:terms, field:author_s, sort: "count desc", limit:3,
method:dvhash, facet: { "count": "min(followers_i)" } } } }
"facets":{
"count":2,
"authors":{
"buckets":[{
"val":"bob",
"count":1,
"count":50000},
{
"val":"tim",
"count":1,
"count":12}]}}} {noformat}
and then with a distributed search the values are merged (with other results
from more shards)
{noformat}
"facets":{
"count":3,
"authors":{
"buckets":[{
"val":"bob",
"count":50001},
{
"val":"tim",
"count":27}]}}} {noformat}
If I change the name from {{count}} to something else, it works correctly
{noformat}
"facets":{
"count":3,
"authors":{
"buckets":[{
"val":"tim",
"count":2,
"mycount":12},
{
"val":"bob",
"count":1,
"mycount":50000}]}}}
{noformat}
> Json faceting allows val and count as stat/subfacet names
> ---------------------------------------------------------
>
> Key: SOLR-15220
> URL: https://issues.apache.org/jira/browse/SOLR-15220
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: Facet Module, JSON Request API
> Affects Versions: 7.7.3, master (9.0), 8.8.1
> Reporter: Tim Owen
> Priority: Minor
>
> The json faceting API allows you to name your stats or subfacets with the
> names {{val}} or {{count}} which leads to confusing results or failed
> requests, because these names are effectively reserved by the code that
> builds the bucket responses.
> We noticed this by accident, when some new client code used the name
> {{count}} for a stat and we were getting unexpected results. What seems to be
> happening is that the NamedList from each shard contains *both* the true
> count and our stat value under the same key. Both NamedList and JSON/XML
> allow duplicates so there was no failure at this point. Then in distributed
> mode, the facet merger combines the values from both keys, and we ended up
> with the overall response having an inflated number for our stat.
> I think we could just validate against those 2 names being used for stats or
> subfacets, in the facet parser {{parseSubs}} method, to avoid this situation.
> I would rather know it's asking for trouble than allow it and get weird
> results or an exception. There may be other reserved names, it depends on the
> facet type used. Alternatively we could throw an exception if a duplicate key
> is used when building the NamedList response, although there isn't a central
> place to check that.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]