I opened https://issues.apache.org/jira/browse/SOLR-11664 to track this. I should be able to look into this shortly if no one else does.
-Yonik On Tue, Nov 21, 2017 at 6:02 PM, Yonik Seeley <ysee...@gmail.com> wrote: > Thanks for the complete info that allowed me to easily reproduce this! > The bug seems to extend beyond hll/unique... I tried min(string_s) and > got wonky results as well. > > -Yonik > > > On Tue, Nov 21, 2017 at 7:47 AM, Volodymyr Rudniev <vmrudn...@gmail.com> > wrote: >> Hello, >> >> I've encountered 2 issues while trying to apply unique()/hll() function to a >> string field inside a range facet: >> >> Results are incorrect for a single-valued string field. >> I’m getting ArrayIndexOutOfBoundsException for a multi-valued string field. >> >> >> How to reproduce: >> >> Create a core based on the default configSet. >> Add several simple documents to the core, like these: >> >> [ >> { >> "id": "14790", >> "int_i": 2010, >> "date_dt": "2010-01-01T00:00:00Z", >> "string_s": "a", >> "string_ss": ["a", "b"] >> }, >> { >> "id": "12254", >> "int_i": 2014, >> "date_dt": "2014-01-01T00:00:00Z", >> "string_s": "e", >> "string_ss": ["b", "c"] >> }, >> { >> "id": "12937", >> "int_i": 2008, >> "date_dt": "2008-01-01T00:00:00Z", >> "string_s": "c", >> "string_ss": ["c", "d"] >> }, >> { >> "id": "10575", >> "int_i": 2008, >> "date_dt": "2008-01-01T00:00:00Z", >> "string_s": "b", >> "string_ss": ["d", "e"] >> }, >> { >> "id": "13644", >> "int_i": 2014, >> "date_dt": "2014-01-01T00:00:00Z", >> "string_s": "e", >> "string_ss": ["e", "a"] >> }, >> { >> "id": "8405", >> "int_i": 2014, >> "date_dt": "2014-01-01T00:00:00Z", >> "string_s": "d", >> "string_ss": ["a", "b"] >> }, >> { >> "id": "6128", >> "int_i": 2008, >> "date_dt": "2008-01-01T00:00:00Z", >> "string_s": "a", >> "string_ss": ["b", "c"] >> }, >> { >> "id": "5220", >> "int_i": 2015, >> "date_dt": "2015-01-01T00:00:00Z", >> "string_s": "d", >> "string_ss": ["c", "d"] >> }, >> { >> "id": "6850", >> "int_i": 2012, >> "date_dt": "2012-01-01T00:00:00Z", >> "string_s": "b", >> "string_ss": ["d", "e"] >> }, >> { >> "id": "5748", >> "int_i": 2014, >> "date_dt": "2014-01-01T00:00:00Z", >> "string_s": "e", >> "string_ss": ["e", "a"] >> } >> ] >> >> 3. Try queries like the following for a single-valued string field: >> >> q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_s)"}}}} >> >> q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_s)"}}}} >> >> Distinct counts returned are incorrect in general. For example, for the set >> of documents above, the response will contain: >> >> { >> "val": 2010, >> "count": 1, >> "distinct_count": 0 >> } >> >> and >> >> "between": { >> "count": 10, >> "distinct_count": 1 >> } >> >> (there should be 5 distinct values). >> >> Note, the result depends on the order in which the documents are added. >> >> 4. Try queries like the following for a multi-valued string field: >> >> q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_ss)"}}}} >> >> q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_ss)"}}}} >> >> I’m getting ArrayIndexOutOfBoundsException for such queries. >> >> Note, everything looks Ok for other field types (I tried single- and >> multi-valued ints, doubles and dates) or when the enclosing facet is a terms >> facet or there is no enclosing facet at all. >> >> I can reproduce these issues both for Solr 7.0.1 and 7.1.0. Solr 6.x and >> 5.x, as it seems, do not have such issues. >> >> Is it a bug? Or, may be, I’ve missed something? >> >> Thanks, >> >> Volodymyr >>