Re: Re[2]: Block Join faceting on intermediate levels with JSON Facet API (might be related to block join rollups & SOLR-8998)

Yonik Seeley Fri, 22 Apr 2016 10:47:31 -0700

On Fri, Apr 22, 2016 at 12:26 PM, Alisa Z. <[email protected]> wrote:
>  Hi Yonik,
>
> Thanks a lot for your response.
>
> I have discussed this with Mikhail Khludnev already and tried this 
> suggestion. Here's what I've got:
>
>
>
> sentiment: positive
> author: Bob
> text: Great post about Solr
> 2.blog-posts.comments-id: 10735-23004                           //this is a 
> new field, field name is different on each level for each type, values are 
> unique
> date: 2015-04-10T11:30:00Z
> path: 2.blog-posts.comments
> id: 10735-23004
> Query:
> curl http://localhost:8985/solr/solr_nesting_unique/query -d 
> 'q=path:2.blog-posts.comments&rows=0&
> json.facet={
>   filter_by_child_type :{
>     type:query,
>     q:"path:*comments*keywords",
>     domain: { blockChildren : "path:2.blog-posts.comments" },
>     facet:{
>       top_entity_text : {
>         type: terms,
>         field: text,
>         limit: 10,
>         sort: "counts_by_comments desc",
>         facet: {
>            counts_by_comments: "unique (2.blog-posts.comments-id )"           
>      // changed
>          }}}}}'



Something is wrong if you are getting 0 counts.
Lets try taking it piece-by-piece:

Step 1:  q=path:2.blog-posts.comments
This finds level 2 documents

Step 2:  domain: { blockChildren : "path:2.blog-posts.comments" }
This first maps to  all of the children (level 3 and level4)

Step 3:  q:"path:*comments*keywords"
This selects a subset of level3 and level4 documents with keywords
(Note, in the future this should be doable as an additional filter in
the domain spec, w/o an additional sub-facet level)

Step 4:
Facet on the text field of those level3 and level4 keyword docs. For
each bucket, also find the unique number of values in the
"2.blog-posts.comments-id" field on those documents.

"Without seeing what you indexed, my guess is that the issue is that
the "2.blog-posts.comments-id" field does not actually exist on those
level3 and level4 docs being faceted.  The JSON Facet API doesn't
propagate field values up/down the nested stack yet.  That's what
https://issues.apache.org/jira/browse/SOLR-8998 is mostly about.

-Yonik


>
> Response:
>
> "response":{"numFound":3,"start":0,"docs":[]
>   },
>   "facets":{
>     "count":3,
>     "filter_by_child_type":{
>       "count":9,
>       "top_entity_text":{
>         "buckets":[{
>             "val":"Elasticsearch",
>             "count":2,
>             "counts_by_comments":0},
>           {
>             "val":"Solr",
>             "count":5,
>             "counts_by_comments":0},
>           {
>             "val":"Solr 5.5",
>             "count":1,
>             "counts_by_comments":0},
>           {
>             "val":"feature",
>             "count":1,
>             "counts_by_comments":0}]}}}}
>
> So unless I messed something up... or the field name does not look 
> "canonical" (but it was fast to generate and  it is accepted in a normal query
> http://localhost:8985/solr/solr_nesting_unique/query?q=2.blog-posts.body-id 
> :* )
>
> So I think that it's just a JSON facet API limitation...
>
> Best,
> --Alisa
>
>
>>Пятница, 22 апреля 2016, 9:55 -04:00 от Yonik Seeley <[email protected]>:
>>
>>Hi Alisa,
>>This was a bit too hard for me to grok on a first pass... then I saw
>>your related blog post which includes the actual sample data and makes
>>it more clear.
>>
>> More comments inline:
>>
>>On Wed, Apr 20, 2016 at 2:29 PM, Alisa Z. < [email protected] > wrote:
>>>  Hi all,
>>>
>>> I have been stretching some SOLR's capabilities for nested documents 
>>> handling and I've come up with the following issue...
>>>
>>> Let's say I have the following structure:
>>>
>>> {
>>> "blog-posts":{                      //level 1
>>>     "leaf-fields":[
>>>         "date",
>>>         "author"],
>>>     "title":{                       //level 2
>>>         "leaf-fields":[ "text"],
>>>         "keywords":{                //level 3
>>>             "leaf-fields":[
>>>                 "text",
>>>                 "type"]
>>>             }
>>>         },
>>>     "body":{                        //level 2
>>>         "leaf-fields":[ "text"],
>>>         "keywords":{                //level 3
>>>             "leaf-fields":[
>>>                 "text",
>>>                 "type"]
>>>             }
>>>         },
>>>     "comments":{                    //level 2
>>>         "leaf-fields":[
>>>             "date",
>>>             "author",
>>>             "text",
>>>             "sentiment"
>>>             ],
>>>         "keywords":{                //level 3
>>>             "leaf-fields":[
>>>                 "text",
>>>                 "type"]
>>>             },
>>>         "replies":{                 //level 3
>>>             "leaf-fields":[
>>>                 "date",
>>>                 "author",
>>>                 "text",
>>>                 "sentiment"],
>>>             "keywords":{            //level 4
>>>                 "leaf-fields":[
>>>                     "text",
>>>                     "type"]
>>>                 }}}}}
>>>
>>>
>>> And I want to know the distribution of all readers' keywords (levels 3 and 
>>> 4) by comments (level 2).
>>> In JSON Facet API I tried this:
>>>
>>> curl http://localhost:8983/solr/my_index/query -d 
>>> 'q=path:2.blog-posts.comments&rows=0&
>>> json.facet={
>>>   filter_by_child_type :{
>>>     type:query,
>>>     q:"path:*comments*keywords",
>>>     domain: { blockChildren : "path:2.blog-posts.comments" },
>>>     facet:{
>>>       top_keywords : {
>>>         type: terms,
>>>         field: text,
>>>         sort: "counts_by_comments desc",
>>>         facet: {
>>>            counts_by_comments: "unique(_root_)"    // I suspect in should 
>>> be a different field, not _root_, but would it be for an intermediate 
>>> document?
>>>          }}}}}'
>>>
>>> Which gives me the wrong results, it aggregates by posts, not by comments 
>>> (it's a toy data set, so I know that the correct answer for "Solr" is 3 
>>> when faceted by for comments)
>>
>>
>>Yeah, this type if thing isn't currently directly supported, but
>>SOLR-8998 should address that.
>>You can currently hack around it (for simple counts) using unique(),
>>as you've discovered, but you need a unique ID at the right level to
>>get the right count.
>>
>>_root_ is unique for blog posts, hence that's why you get numbers of
>>posts (as opposed to numbers of level-2 comments).
>>You could add a "level2_comment_id" field to the level 2 commends and
>>it's children, and then use unique() on that.
>>
>>-Yonik
>>
>>
>>> {
>>> "response":{"numFound":3,"start":0,"docs":[]
>>>   },
>>>   "facets":{
>>>     "count":3,
>>>     "filter_by_child_type":{
>>>       "count":9,
>>>       "top_keywords":{
>>>         "buckets":[{
>>>             "val":"Elasticsearch",
>>>             "count":2,
>>>             "counts_by_comments":2},
>>>           {
>>>             "val":"Solr",
>>>             "count":5,
>>>             "counts_by_comments":2},               //here the count by 
>>> "comments" should be 3
>>>           {
>>>             "val":"Solr 5.5",
>>>             "count":1,
>>>             "counts_by_comments":1},
>>>           {
>>>             "val":"feature",
>>>             "count":1,
>>>             "counts_by_comments":1}]}}}}
>>>
>>>
>>> Am I writing the query wrong?
>>>
>>>
>>> By the way, Block Join Faceting works fine for this:
>>> bjqfacet?q={!parent%20which=path:2.blog-posts.comments}path:*.comments*keywords&rows=0&facet=true&child.facet.field=text&wt=json&indent=true
>>>
>>> {
>>>   "response":{"numFound":3,"start":0,"docs":[]
>>>   },
>>>   "facet_counts":{
>>>     "facet_queries":{},
>>>     "facet_fields":{
>>>       "text":[
>>>         "Elasticsearch",2,
>>>         "Solr",3,                                  //correct result
>>>         "Solr 5.5",1,
>>>         "feature",1]},
>>>     "facet_dates":{},
>>>     "facet_ranges":{},
>>>     "facet_intervals":{},
>>>     "facet_heatmaps":{}}}
>>>
>>> But we've already discussed that it returns too much stuff: no way to put 
>>> limits or order by counts :(  That's why I want to see whether it's posible 
>>> to make JSON Facet API straight.
>>>
>>> Thank you in advance!
>>>
>>> --
>>> Alisa Zhila
>

Re: Re[2]: Block Join faceting on intermediate levels with JSON Facet API (might be related to block join rollups & SOLR-8998)

Reply via email to