>>You could add a "level2_comment_id" field to the level 2 commends and >>it's children, and then use unique() on that.
OK, I see, I missed the children... Thank you for pointing out. I have introduced that "unique sub-branch identifying" field and propagated it down the subbranch (the data is here: https://github.com/alisa-ipn/solr_nesting/blob/master/data/example-data-solr-for-faceting.json). Also changed the corresponding part of the post. And it actually works. Yet it requires a lot of effort to make Json API faceting handle faceting by intermediate levels. Making those "unique sub-branch identifying" fields dynamically appear the same way as the "_root_" field does will make Solr use friendlier for nested data like email chains and social media data... Thanks, Alisa >Пятница, 22 апреля 2016, 13:47 -04:00 от Yonik Seeley <ysee...@gmail.com>: > >On Fri, Apr 22, 2016 at 12:26 PM, Alisa Z. < prol...@mail.ru > wrote: >> Hi Yonik, >> >> Thanks a lot for your response. >> >> I have discussed this with Mikhail Khludnev already and tried this >> suggestion. Here's what I've got: >> >> >> >> sentiment: positive >> author: Bob >> text: Great post about Solr >> 2.blog-posts.comments-id: 10735-23004 //this is a >> new field, field name is different on each level for each type, values are >> unique >> date: 2015-04-10T11:30:00Z >> path: 2.blog-posts.comments >> id: 10735-23004 >> Query: >> curl http://localhost:8985/solr/solr_nesting_unique/query -d >> 'q=path:2.blog-posts.comments&rows=0& >> json.facet={ >> filter_by_child_type :{ >> type:query, >> q:"path:*comments*keywords", >> domain: { blockChildren : "path:2.blog-posts.comments" }, >> facet:{ >> top_entity_text : { >> type: terms, >> field: text, >> limit: 10, >> sort: "counts_by_comments desc", >> facet: { >> counts_by_comments: "unique (2.blog-posts.comments-id )" >> // changed >> }}}}}' > > >Something is wrong if you are getting 0 counts. >Lets try taking it piece-by-piece: > >Step 1: q=path:2.blog-posts.comments >This finds level 2 documents > >Step 2: domain: { blockChildren : "path:2.blog-posts.comments" } >This first maps to all of the children (level 3 and level4) > >Step 3: q:"path:*comments*keywords" >This selects a subset of level3 and level4 documents with keywords >(Note, in the future this should be doable as an additional filter in >the domain spec, w/o an additional sub-facet level) > >Step 4: >Facet on the text field of those level3 and level4 keyword docs. For >each bucket, also find the unique number of values in the >"2.blog-posts.comments-id" field on those documents. > >"Without seeing what you indexed, my guess is that the issue is that >the "2.blog-posts.comments-id" field does not actually exist on those >level3 and level4 docs being faceted. The JSON Facet API doesn't >propagate field values up/down the nested stack yet. That's what >https://issues.apache.org/jira/browse/SOLR-8998 is mostly about. > >-Yonik > > >> >> Response: >> >> "response":{"numFound":3,"start":0,"docs":[] >> }, >> "facets":{ >> "count":3, >> "filter_by_child_type":{ >> "count":9, >> "top_entity_text":{ >> "buckets":[{ >> "val":"Elasticsearch", >> "count":2, >> "counts_by_comments":0}, >> { >> "val":"Solr", >> "count":5, >> "counts_by_comments":0}, >> { >> "val":"Solr 5.5", >> "count":1, >> "counts_by_comments":0}, >> { >> "val":"feature", >> "count":1, >> "counts_by_comments":0}]}}}} >> >> So unless I messed something up... or the field name does not look >> "canonical" (but it was fast to generate and it is accepted in a normal >> query >> http://localhost:8985/solr/solr_nesting_unique/query?q=2.blog-posts.body-id >> :* ) >> >> So I think that it's just a JSON facet API limitation... >> >> Best, >> --Alisa >> >> >>>Пятница, 22 апреля 2016, 9:55 -04:00 от Yonik Seeley < ysee...@gmail.com >: >>> >>>Hi Alisa, >>>This was a bit too hard for me to grok on a first pass... then I saw >>>your related blog post which includes the actual sample data and makes >>>it more clear. >>> >>> More comments inline: >>> >>>On Wed, Apr 20, 2016 at 2:29 PM, Alisa Z. < prol...@mail.ru > wrote: >>>> Hi all, >>>> >>>> I have been stretching some SOLR's capabilities for nested documents >>>> handling and I've come up with the following issue... >>>> >>>> Let's say I have the following structure: >>>> >>>> { >>>> "blog-posts":{ //level 1 >>>> "leaf-fields":[ >>>> "date", >>>> "author"], >>>> "title":{ //level 2 >>>> "leaf-fields":[ "text"], >>>> "keywords":{ //level 3 >>>> "leaf-fields":[ >>>> "text", >>>> "type"] >>>> } >>>> }, >>>> "body":{ //level 2 >>>> "leaf-fields":[ "text"], >>>> "keywords":{ //level 3 >>>> "leaf-fields":[ >>>> "text", >>>> "type"] >>>> } >>>> }, >>>> "comments":{ //level 2 >>>> "leaf-fields":[ >>>> "date", >>>> "author", >>>> "text", >>>> "sentiment" >>>> ], >>>> "keywords":{ //level 3 >>>> "leaf-fields":[ >>>> "text", >>>> "type"] >>>> }, >>>> "replies":{ //level 3 >>>> "leaf-fields":[ >>>> "date", >>>> "author", >>>> "text", >>>> "sentiment"], >>>> "keywords":{ //level 4 >>>> "leaf-fields":[ >>>> "text", >>>> "type"] >>>> }}}}} >>>> >>>> >>>> And I want to know the distribution of all readers' keywords (levels 3 and >>>> 4) by comments (level 2). >>>> In JSON Facet API I tried this: >>>> >>>> curl http://localhost:8983/solr/my_index/query -d >>>> 'q=path:2.blog-posts.comments&rows=0& >>>> json.facet={ >>>> filter_by_child_type :{ >>>> type:query, >>>> q:"path:*comments*keywords", >>>> domain: { blockChildren : "path:2.blog-posts.comments" }, >>>> facet:{ >>>> top_keywords : { >>>> type: terms, >>>> field: text, >>>> sort: "counts_by_comments desc", >>>> facet: { >>>> counts_by_comments: "unique(_root_)" // I suspect in should >>>> be a different field, not _root_, but would it be for an intermediate >>>> document? >>>> }}}}}' >>>> >>>> Which gives me the wrong results, it aggregates by posts, not by comments >>>> (it's a toy data set, so I know that the correct answer for "Solr" is 3 >>>> when faceted by for comments) >>> >>> >>>Yeah, this type if thing isn't currently directly supported, but >>>SOLR-8998 should address that. >>>You can currently hack around it (for simple counts) using unique(), >>>as you've discovered, but you need a unique ID at the right level to >>>get the right count. >>> >>>_root_ is unique for blog posts, hence that's why you get numbers of >>>posts (as opposed to numbers of level-2 comments). >>>You could add a "level2_comment_id" field to the level 2 commends and >>>it's children, and then use unique() on that. >>> >>>-Yonik >>> >>> >>>> { >>>> "response":{"numFound":3,"start":0,"docs":[] >>>> }, >>>> "facets":{ >>>> "count":3, >>>> "filter_by_child_type":{ >>>> "count":9, >>>> "top_keywords":{ >>>> "buckets":[{ >>>> "val":"Elasticsearch", >>>> "count":2, >>>> "counts_by_comments":2}, >>>> { >>>> "val":"Solr", >>>> "count":5, >>>> "counts_by_comments":2}, //here the count by >>>> "comments" should be 3 >>>> { >>>> "val":"Solr 5.5", >>>> "count":1, >>>> "counts_by_comments":1}, >>>> { >>>> "val":"feature", >>>> "count":1, >>>> "counts_by_comments":1}]}}}} >>>> >>>> >>>> Am I writing the query wrong? >>>> >>>> >>>> By the way, Block Join Faceting works fine for this: >>>> bjqfacet?q={!parent%20which=path:2.blog-posts.comments}path:*.comments*keywords&rows=0&facet=true&child.facet.field=text&wt=json&indent=true >>>> >>>> { >>>> "response":{"numFound":3,"start":0,"docs":[] >>>> }, >>>> "facet_counts":{ >>>> "facet_queries":{}, >>>> "facet_fields":{ >>>> "text":[ >>>> "Elasticsearch",2, >>>> "Solr",3, //correct result >>>> "Solr 5.5",1, >>>> "feature",1]}, >>>> "facet_dates":{}, >>>> "facet_ranges":{}, >>>> "facet_intervals":{}, >>>> "facet_heatmaps":{}}} >>>> >>>> But we've already discussed that it returns too much stuff: no way to put >>>> limits or order by counts :( That's why I want to see whether it's >>>> posible to make JSON Facet API straight. >>>> >>>> Thank you in advance! >>>> >>>> -- >>>> Alisa Zhila >>