I opened https://issues.apache.org/jira/browse/SOLR-11664 to track this.
I should be able to look into this shortly if no one else does.

-Yonik


On Tue, Nov 21, 2017 at 6:02 PM, Yonik Seeley <ysee...@gmail.com> wrote:
> Thanks for the complete info that allowed me to easily reproduce this!
> The bug seems to extend beyond hll/unique... I tried min(string_s) and
> got wonky results as well.
>
> -Yonik
>
>
> On Tue, Nov 21, 2017 at 7:47 AM, Volodymyr Rudniev <vmrudn...@gmail.com> 
> wrote:
>> Hello,
>>
>> I've encountered 2 issues while trying to apply unique()/hll() function to a
>> string field inside a range facet:
>>
>> Results are incorrect for a single-valued string field.
>> I’m getting ArrayIndexOutOfBoundsException for a multi-valued string field.
>>
>>
>> How to reproduce:
>>
>> Create a core based on the default configSet.
>> Add several simple documents to the core, like these:
>>
>> [
>>   {
>>     "id": "14790",
>>     "int_i": 2010,
>>     "date_dt": "2010-01-01T00:00:00Z",
>>     "string_s": "a",
>>     "string_ss": ["a", "b"]
>>   },
>>   {
>>     "id": "12254",
>>     "int_i": 2014,
>>     "date_dt": "2014-01-01T00:00:00Z",
>>     "string_s": "e",
>>     "string_ss": ["b", "c"]
>>   },
>>   {
>>     "id": "12937",
>>     "int_i": 2008,
>>     "date_dt": "2008-01-01T00:00:00Z",
>>     "string_s": "c",
>>     "string_ss": ["c", "d"]
>>   },
>>   {
>>     "id": "10575",
>>     "int_i": 2008,
>>     "date_dt": "2008-01-01T00:00:00Z",
>>     "string_s": "b",
>>     "string_ss": ["d", "e"]
>>   },
>>   {
>>     "id": "13644",
>>     "int_i": 2014,
>>     "date_dt": "2014-01-01T00:00:00Z",
>>     "string_s": "e",
>>     "string_ss": ["e", "a"]
>>   },
>>   {
>>     "id": "8405",
>>     "int_i": 2014,
>>     "date_dt": "2014-01-01T00:00:00Z",
>>     "string_s": "d",
>>     "string_ss": ["a", "b"]
>>   },
>>   {
>>     "id": "6128",
>>     "int_i": 2008,
>>     "date_dt": "2008-01-01T00:00:00Z",
>>     "string_s": "a",
>>     "string_ss": ["b", "c"]
>>   },
>>   {
>>     "id": "5220",
>>     "int_i": 2015,
>>     "date_dt": "2015-01-01T00:00:00Z",
>>     "string_s": "d",
>>     "string_ss": ["c", "d"]
>>   },
>>   {
>>     "id": "6850",
>>     "int_i": 2012,
>>     "date_dt": "2012-01-01T00:00:00Z",
>>     "string_s": "b",
>>     "string_ss": ["d", "e"]
>>   },
>>   {
>>     "id": "5748",
>>     "int_i": 2014,
>>     "date_dt": "2014-01-01T00:00:00Z",
>>     "string_s": "e",
>>     "string_ss": ["e", "a"]
>>   }
>> ]
>>
>> 3. Try queries like the following for a single-valued string field:
>>
>> q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_s)"}}}}
>>
>> q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_s)"}}}}
>>
>> Distinct counts returned are incorrect in general. For example, for the set
>> of documents above, the response will contain:
>>
>> {
>>     "val": 2010,
>>     "count": 1,
>>     "distinct_count": 0
>> }
>>
>> and
>>
>> "between": {
>>     "count": 10,
>>     "distinct_count": 1
>> }
>>
>> (there should be 5 distinct values).
>>
>> Note, the result depends on the order in which the documents are added.
>>
>> 4. Try queries like the following for a multi-valued string field:
>>
>> q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_ss)"}}}}
>>
>> q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_ss)"}}}}
>>
>> I’m getting ArrayIndexOutOfBoundsException for such queries.
>>
>> Note, everything looks Ok for other field types (I tried single- and
>> multi-valued ints, doubles and dates) or when the enclosing facet is a terms
>> facet or there is no enclosing facet at all.
>>
>> I can reproduce these issues both for Solr 7.0.1 and 7.1.0. Solr 6.x and
>> 5.x, as it seems, do not have such issues.
>>
>> Is it a bug? Or, may be, I’ve missed something?
>>
>> Thanks,
>>
>> Volodymyr
>>

Reply via email to