Re[2]: [nesting] JSON Facet API vs. BlockJoin Faceting: need help on queries (Facet API facets by wrong doc level VS. BlockJoin Faceting does not return top 10 most frequent)

Alisa Z . Tue, 29 Mar 2016 08:28:39 -0700

 So the first issue eventually solved by adding facet: {top_terms_by_doc: 
"unique(_root_)"} AND sorting the outer facet buckets by this faceting:


curl http://localhost:8985/solr/enron_path_w_ts/query -d 
'q={!parent%20which="type_s:doc"}type_s:doc.userData%20%2BSubject_t:california&rows=0&
json.facet={
  filter_by_child_type :{
    type:query,
    q:"type_s:doc.enriched.text.keywords",
    domain: { blockChildren : "type_s:doc" },
    facet:{
      top_keywords_text : {
        type: terms,
        field: text_t,
        limit: 10,
        sort: "top_terms_by_doc desc",
         facet: {
           top_terms_by_doc: "unique(_root_)"
         }
      }
    }
  }
}'


The  BlockJoin Faceting  part is still open:  I've tried all conventional 
faceting parameters:  facet.limit, child.facet.limit, f.text_t.facet.limit ... 
nothing worked :( 


>Понедельник, 28 марта 2016, 17:20 -04:00 от Alisa Z. <prol...@mail.ru>:
>
>Ok, so for the 1st question, I think I'm getting closer:  adding  facet: 
>{top_terms_by_doc: "unique(_root_)"}  as indicated in  
>http://blog.griddynamics.com/search/label/~Mikhail%20Khludnev returns correct 
>counts. However, sorting is done by the upper faceting not by the 
>unique(_root_):  
>
>
>curl  http://localhost:8985/solr/my_collection /query -d 
>'q={!parent%20which="type_s:doc"}type_s:doc.userData%20%2BSubject_t:california&rows=0&
>json.facet={
>  filter_by_child_type :{
>    type:query,
>    q:"type_s:doc.enriched.text.keywords",
>    domain: { blockChildren : "type_s:doc" },
>    facet:{
>      top_keywords_text : {
>        type: terms,
>        field: text_t,
>        limit: 10,
>        facet: {
>           top_terms_by_doc: "unique(_root_)"
>         }
>      }
>    }
>  }
>}'
>
>RETURNS 
>
>{
>  "responseHeader":{
>    "status":0,
>    "QTime":25,
>    "params":{
>      "q":"{!parent which=\"type_s:doc\"}type_s:doc.userData 
>+Subject_t:california",
>      "json.facet":"{\n  filter_by_child_type :{\n    type:query,\n    
>q:\"type_s:doc.enriched.text.keywords\",\n    domain: { blockChildren : 
>\"type_s:doc\" },\n    facet:{\n      top_keywords_text : {\n        type: 
>terms,\n        field: text_t,\n        limit: 10,\n        facet: {\n         
>  top_terms_by_doc: \"unique(_root_)\"\n         }\n      }\n    }\n  }\n}",
>      "rows":"0"}},
>  "response":{"numFound":19,"start":0,"docs":[]
>  },
>  "facets":{
>    "count":19,
>    "filter_by_child_type":{
>      "count":686,
>      "top_keywords_text":{
>        "buckets":[{
>            "val":"enron",
>            "count":57,
>            "top_terms_by_doc":9},
>          {
>            "val":"california",
>            "count":22,
>            "top_terms_by_doc":13},
>          {
>            "val":"power",
>            "count":21,
>            "top_terms_by_doc":7},
>          {
>            "val":"rate",
>            "count":15,
>            "top_terms_by_doc":5},
>          {
>            "val":"plan",
>            "count":13,
>            "top_terms_by_doc":3},
>          {
>            "val":"hou",
>            "count":12,
>            "top_terms_by_doc":5},
>          {
>            "val":"energy",
>            "count":11,
>            "top_terms_by_doc":5},
>          {
>            "val":"na",
>            "count":11,
>            "top_terms_by_doc":5},
>          {
>            "val":"mckinsey",
>            "count":10,
>            "top_terms_by_doc":1},
>          {
>            "val":"socal",
>            "count":10,
>            "top_terms_by_doc":4}]}}}}
>
>Nice, but I want them to be ordered by "top_terms_by_doc" frequencies,  not by 
>the "count" frequencies. 
>Any suggestions?
>
>Thanks,
>Alisa 
>
>
>
>
>
>>Понедельник, 28 марта 2016, 15:39 -04:00 от Alisa Z. < prol...@mail.ru >:
>>
>>Hi all, 
>>
>>I am trying to perform faceting of parent docs by nested document fields. 
>>I've tried 2 approaches as in subject, yet in first the results are not quite 
>>correct and in the 2nd I cannot get the query right. So I need help on either 
>>of them and any explication or documentation or blogs on the behavior is much 
>>appreciated.   
>>
>>Verbally the query is as follows: "Find top 10 keywords for all documents 
>>with "california" in email subject line"
>>
>>Here is the query with responses: 
>>
>>==== Json Facet API ====  
>>
>>curl http://localhost:8985/solr/my_collection/query -d 
>>'q={!parent%20which="type_s:doc"}type_s:doc.userData%20%2BSubject_t:california&rows=0&
>>json.facet={
>>  filter_by_child_type :{
>>    type:query,
>>    q:"type_s:doc.enriched.text.keywords",
>>    domain: { blockChildren : "type_s:doc" },
>>    facet:{
>>      top_keywords_text : {
>>        type: terms,
>>        field: text_t,
>>        limit: 10
>>      }
>>    }
>>  }
>>}'
>>
>>RETURNS:  
>>
>>{
>>  "responseHeader":{
>>    "status":0,
>>    "QTime":134,
>>    "params":{
>>      "q":"{!parent which=\"type_s:doc\"}type_s:doc.userData 
>>+Subject_t:california",
>>      "json.facet":"{\n  filter_by_child_type :{\n    type:query,\n    
>>q:\"type_s:doc.enriched.text.keywords\",\n    domain: { blockChildren : 
>>\"type_s:doc\" },\n    facet:{\n      top_keywords_text : {\n        type: 
>>terms,\n        field: text_t,\n        limit: 10\n      }\n    }\n  }\n}",
>>      "rows":"0"}},
>>  "response":{"numFound":19,"start":0,"docs":[]
>>  },
>>  "facets":{
>>    "count":19,
>>    "filter_by_child_type":{
>>      "count":686,
>>      "top_keywords_text":{
>>        "buckets":[{
>>            "val":"enron",
>>            "count":57},
>>          {
>>            "val":"california",
>>            "count":22},
>>          {
>>            "val":"power",
>>            "count":21},
>>          {
>>            "val":"rate",
>>            "count":15},
>>          {
>>            "val":"plan",
>>            "count":13},
>>          {
>>            "val":"hou",
>>            "count":12},
>>          {
>>            "val":"energy",
>>            "count":11},
>>          {
>>            "val":"na",
>>            "count":11},
>>          {
>>            "val":"mckinsey",
>>            "count":10},
>>          {
>>            "val":"socal",
>>            "count":10}]}}}}
>>
>>
>>QUESTION:  where do the counts greater than 19 (the total number of the 
>>top-level documents returned by the query) comes from?  How to adjust the 
>>query to facet only on the top-level documents (and consequently no count 
>>should be greater than 19)? 
>>
>>
>>===== BlockJoin Faceting ====== 
>>Following the example on  
>>https://cwiki.apache.org/confluence/display/solr/BlockJoin+Faceting , I've 
>>tried this:  
>>
>>/bjqfacet?q={!parent%20which=type_s:doc}type_s:doc.enriched.text.keywords&child.facet.field=text_t&child.facet.limit=10&child.facet.mincount=5&rows=0&fq={!parent%20which=type_s:doc}type_s:doc.userData%20%2BSubject_t:california&wt=json&indent=true
>>
>>RETURNS: 
>>
>>{
>>  "responseHeader":{
>>    "status":0,
>>    "QTime":1},
>>  "response":{"numFound":19,"start":0,"docs":[]
>>  },
>>  "facet_counts":[
>>    "facet_fields",[
>>      "text_t",[
>>        "128x",1,
>>        "18xx",1,
>>        "1x",1,
>>        "2",2,
>>        "30",1,
>>        "60",1,
>>        "78xx",1,
>>        "82xx",1,
>>        "ab",2,
>>        "access",5,
>>        "account",1,
>>        "accounts",1,
>>...
>>"california",13,
>>...
>>"enron",9,
>>...
>>]]]}
>>
>>QUESTION: This looks very close to what I want, yet why  
>>child.facet.limit=10&child.facet.mincount=5 are ignored?  How to get top 10 
>>most frequent? 
>>
>>
>>Thank you for your help in advance! 
>>
>>-- 
>>Alisa Zhila

Re[2]: [nesting] JSON Facet API vs. BlockJoin Faceting: need help on queries (Facet API facets by wrong doc level VS. BlockJoin Faceting does not return top 10 most frequent)

Reply via email to