Hi all, 

I have been stretching some SOLR's capabilities for nested documents handling 
and I've come up with the following issue...

Let's say I have the following structure:

{
"blog-posts":{                      //level 1
    "leaf-fields":[
        "date",
        "author"],
    "title":{                       //level 2
        "leaf-fields":[ "text"],
        "keywords":{                //level 3
            "leaf-fields":[
                "text",
                "type"]
            }
        },
    "body":{                        //level 2
        "leaf-fields":[ "text"],
        "keywords":{                //level 3
            "leaf-fields":[
                "text",
                "type"]
            }
        },
    "comments":{                    //level 2
        "leaf-fields":[
            "date",
            "author",
            "text",
            "sentiment"
            ],
        "keywords":{                //level 3
            "leaf-fields":[
                "text",
                "type"]
            },
        "replies":{                 //level 3
            "leaf-fields":[
                "date",
                "author",
                "text",
                "sentiment"],
            "keywords":{            //level 4
                "leaf-fields":[
                    "text",
                    "type"]
                }}}}}
  
  
And I want to know the distribution of all readers' keywords (levels 3 and 4) 
by comments (level 2).  
In JSON Facet API I tried this: 

curl http://localhost:8983/solr/my_index/query -d 
'q=path:2.blog-posts.comments&rows=0&
json.facet={
  filter_by_child_type :{
    type:query,
    q:"path:*comments*keywords",
    domain: { blockChildren : "path:2.blog-posts.comments" },
    facet:{
      top_keywords : {
        type: terms,
        field: text,
        sort: "counts_by_comments desc",
        facet: {
           counts_by_comments: "unique(_root_)"    // I suspect in should be a 
different field, not _root_, but would it be for an intermediate document? 
         }}}}}'

Which gives me the wrong results, it aggregates by posts, not by comments (it's 
a toy data set, so I know that the correct answer for "Solr" is 3 when faceted 
by for comments)

{
"response":{"numFound":3,"start":0,"docs":[]
  },
  "facets":{
    "count":3,
    "filter_by_child_type":{
      "count":9,
      "top_keywords":{
        "buckets":[{
            "val":"Elasticsearch",
            "count":2,
            "counts_by_comments":2},
          {
            "val":"Solr",
            "count":5,
            "counts_by_comments":2},               //here the count by 
"comments" should be 3 
          {
            "val":"Solr 5.5",
            "count":1,
            "counts_by_comments":1},
          {
            "val":"feature",
            "count":1,
            "counts_by_comments":1}]}}}}


Am I writing the query wrong? 


By the way, Block Join Faceting works fine for this: 
bjqfacet?q={!parent%20which=path:2.blog-posts.comments}path:*.comments*keywords&rows=0&facet=true&child.facet.field=text&wt=json&indent=true

{
  "response":{"numFound":3,"start":0,"docs":[]
  },
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{
      "text":[
        "Elasticsearch",2,
        "Solr",3,                                  //correct result 
        "Solr 5.5",1,
        "feature",1]},
    "facet_dates":{},
    "facet_ranges":{},
    "facet_intervals":{},
    "facet_heatmaps":{}}}

But we've already discussed that it returns too much stuff: no way to put 
limits or order by counts :(  That's why I want to see whether it's posible to 
make JSON Facet API straight. 

Thank you in advance!

-- 
Alisa Zhila

Reply via email to