Thanks for the complete info that allowed me to easily reproduce this! The bug seems to extend beyond hll/unique... I tried min(string_s) and got wonky results as well.
-Yonik On Tue, Nov 21, 2017 at 7:47 AM, Volodymyr Rudniev <vmrudn...@gmail.com> wrote: > Hello, > > I've encountered 2 issues while trying to apply unique()/hll() function to a > string field inside a range facet: > > Results are incorrect for a single-valued string field. > I’m getting ArrayIndexOutOfBoundsException for a multi-valued string field. > > > How to reproduce: > > Create a core based on the default configSet. > Add several simple documents to the core, like these: > > [ > { > "id": "14790", > "int_i": 2010, > "date_dt": "2010-01-01T00:00:00Z", > "string_s": "a", > "string_ss": ["a", "b"] > }, > { > "id": "12254", > "int_i": 2014, > "date_dt": "2014-01-01T00:00:00Z", > "string_s": "e", > "string_ss": ["b", "c"] > }, > { > "id": "12937", > "int_i": 2008, > "date_dt": "2008-01-01T00:00:00Z", > "string_s": "c", > "string_ss": ["c", "d"] > }, > { > "id": "10575", > "int_i": 2008, > "date_dt": "2008-01-01T00:00:00Z", > "string_s": "b", > "string_ss": ["d", "e"] > }, > { > "id": "13644", > "int_i": 2014, > "date_dt": "2014-01-01T00:00:00Z", > "string_s": "e", > "string_ss": ["e", "a"] > }, > { > "id": "8405", > "int_i": 2014, > "date_dt": "2014-01-01T00:00:00Z", > "string_s": "d", > "string_ss": ["a", "b"] > }, > { > "id": "6128", > "int_i": 2008, > "date_dt": "2008-01-01T00:00:00Z", > "string_s": "a", > "string_ss": ["b", "c"] > }, > { > "id": "5220", > "int_i": 2015, > "date_dt": "2015-01-01T00:00:00Z", > "string_s": "d", > "string_ss": ["c", "d"] > }, > { > "id": "6850", > "int_i": 2012, > "date_dt": "2012-01-01T00:00:00Z", > "string_s": "b", > "string_ss": ["d", "e"] > }, > { > "id": "5748", > "int_i": 2014, > "date_dt": "2014-01-01T00:00:00Z", > "string_s": "e", > "string_ss": ["e", "a"] > } > ] > > 3. Try queries like the following for a single-valued string field: > > q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_s)"}}}} > > q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_s)"}}}} > > Distinct counts returned are incorrect in general. For example, for the set > of documents above, the response will contain: > > { > "val": 2010, > "count": 1, > "distinct_count": 0 > } > > and > > "between": { > "count": 10, > "distinct_count": 1 > } > > (there should be 5 distinct values). > > Note, the result depends on the order in which the documents are added. > > 4. Try queries like the following for a multi-valued string field: > > q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_ss)"}}}} > > q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_ss)"}}}} > > I’m getting ArrayIndexOutOfBoundsException for such queries. > > Note, everything looks Ok for other field types (I tried single- and > multi-valued ints, doubles and dates) or when the enclosing facet is a terms > facet or there is no enclosing facet at all. > > I can reproduce these issues both for Solr 7.0.1 and 7.1.0. Solr 6.x and > 5.x, as it seems, do not have such issues. > > Is it a bug? Or, may be, I’ve missed something? > > Thanks, > > Volodymyr >