Thanks for the complete info that allowed me to easily reproduce this!
The bug seems to extend beyond hll/unique... I tried min(string_s) and
got wonky results as well.

-Yonik


On Tue, Nov 21, 2017 at 7:47 AM, Volodymyr Rudniev <vmrudn...@gmail.com> wrote:
> Hello,
>
> I've encountered 2 issues while trying to apply unique()/hll() function to a
> string field inside a range facet:
>
> Results are incorrect for a single-valued string field.
> I’m getting ArrayIndexOutOfBoundsException for a multi-valued string field.
>
>
> How to reproduce:
>
> Create a core based on the default configSet.
> Add several simple documents to the core, like these:
>
> [
>   {
>     "id": "14790",
>     "int_i": 2010,
>     "date_dt": "2010-01-01T00:00:00Z",
>     "string_s": "a",
>     "string_ss": ["a", "b"]
>   },
>   {
>     "id": "12254",
>     "int_i": 2014,
>     "date_dt": "2014-01-01T00:00:00Z",
>     "string_s": "e",
>     "string_ss": ["b", "c"]
>   },
>   {
>     "id": "12937",
>     "int_i": 2008,
>     "date_dt": "2008-01-01T00:00:00Z",
>     "string_s": "c",
>     "string_ss": ["c", "d"]
>   },
>   {
>     "id": "10575",
>     "int_i": 2008,
>     "date_dt": "2008-01-01T00:00:00Z",
>     "string_s": "b",
>     "string_ss": ["d", "e"]
>   },
>   {
>     "id": "13644",
>     "int_i": 2014,
>     "date_dt": "2014-01-01T00:00:00Z",
>     "string_s": "e",
>     "string_ss": ["e", "a"]
>   },
>   {
>     "id": "8405",
>     "int_i": 2014,
>     "date_dt": "2014-01-01T00:00:00Z",
>     "string_s": "d",
>     "string_ss": ["a", "b"]
>   },
>   {
>     "id": "6128",
>     "int_i": 2008,
>     "date_dt": "2008-01-01T00:00:00Z",
>     "string_s": "a",
>     "string_ss": ["b", "c"]
>   },
>   {
>     "id": "5220",
>     "int_i": 2015,
>     "date_dt": "2015-01-01T00:00:00Z",
>     "string_s": "d",
>     "string_ss": ["c", "d"]
>   },
>   {
>     "id": "6850",
>     "int_i": 2012,
>     "date_dt": "2012-01-01T00:00:00Z",
>     "string_s": "b",
>     "string_ss": ["d", "e"]
>   },
>   {
>     "id": "5748",
>     "int_i": 2014,
>     "date_dt": "2014-01-01T00:00:00Z",
>     "string_s": "e",
>     "string_ss": ["e", "a"]
>   }
> ]
>
> 3. Try queries like the following for a single-valued string field:
>
> q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_s)"}}}}
>
> q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_s)"}}}}
>
> Distinct counts returned are incorrect in general. For example, for the set
> of documents above, the response will contain:
>
> {
>     "val": 2010,
>     "count": 1,
>     "distinct_count": 0
> }
>
> and
>
> "between": {
>     "count": 10,
>     "distinct_count": 1
> }
>
> (there should be 5 distinct values).
>
> Note, the result depends on the order in which the documents are added.
>
> 4. Try queries like the following for a multi-valued string field:
>
> q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_ss)"}}}}
>
> q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_ss)"}}}}
>
> I’m getting ArrayIndexOutOfBoundsException for such queries.
>
> Note, everything looks Ok for other field types (I tried single- and
> multi-valued ints, doubles and dates) or when the enclosing facet is a terms
> facet or there is no enclosing facet at all.
>
> I can reproduce these issues both for Solr 7.0.1 and 7.1.0. Solr 6.x and
> 5.x, as it seems, do not have such issues.
>
> Is it a bug? Or, may be, I’ve missed something?
>
> Thanks,
>
> Volodymyr
>

Reply via email to