Hi all, 

I am trying to perform faceting of parent docs by nested document fields. I've 
tried 2 approaches as in subject, yet in first the results are not quite 
correct and in the 2nd I cannot get the query right. So I need help on either 
of them and any explication or documentation or blogs on the behavior is much 
appreciated.   

Verbally the query is as follows: "Find top 10 keywords for all documents with 
"california" in email subject line"

Here is the query with responses: 

==== Json Facet API ====  

curl http://localhost:8985/solr/my_collection/query -d 
'q={!parent%20which="type_s:doc"}type_s:doc.userData%20%2BSubject_t:california&rows=0&
json.facet={
  filter_by_child_type :{
    type:query,
    q:"type_s:doc.enriched.text.keywords",
    domain: { blockChildren : "type_s:doc" },
    facet:{
      top_keywords_text : {
        type: terms,
        field: text_t,
        limit: 10
      }
    }
  }
}'

RETURNS:  

{
  "responseHeader":{
    "status":0,
    "QTime":134,
    "params":{
      "q":"{!parent which=\"type_s:doc\"}type_s:doc.userData 
+Subject_t:california",
      "json.facet":"{\n  filter_by_child_type :{\n    type:query,\n    
q:\"type_s:doc.enriched.text.keywords\",\n    domain: { blockChildren : 
\"type_s:doc\" },\n    facet:{\n      top_keywords_text : {\n        type: 
terms,\n        field: text_t,\n        limit: 10\n      }\n    }\n  }\n}",
      "rows":"0"}},
  "response":{"numFound":19,"start":0,"docs":[]
  },
  "facets":{
    "count":19,
    "filter_by_child_type":{
      "count":686,
      "top_keywords_text":{
        "buckets":[{
            "val":"enron",
            "count":57},
          {
            "val":"california",
            "count":22},
          {
            "val":"power",
            "count":21},
          {
            "val":"rate",
            "count":15},
          {
            "val":"plan",
            "count":13},
          {
            "val":"hou",
            "count":12},
          {
            "val":"energy",
            "count":11},
          {
            "val":"na",
            "count":11},
          {
            "val":"mckinsey",
            "count":10},
          {
            "val":"socal",
            "count":10}]}}}}


QUESTION:  where do the counts greater than 19 (the total number of the 
top-level documents returned by the query) comes from?  How to adjust the query 
to facet only on the top-level documents (and consequently no count should be 
greater than 19)? 


===== BlockJoin Faceting ====== 
Following the example on  
https://cwiki.apache.org/confluence/display/solr/BlockJoin+Faceting , I've 
tried this:  

/bjqfacet?q={!parent%20which=type_s:doc}type_s:doc.enriched.text.keywords&child.facet.field=text_t&child.facet.limit=10&child.facet.mincount=5&rows=0&fq={!parent%20which=type_s:doc}type_s:doc.userData%20%2BSubject_t:california&wt=json&indent=true

RETURNS: 

{
  "responseHeader":{
    "status":0,
    "QTime":1},
  "response":{"numFound":19,"start":0,"docs":[]
  },
  "facet_counts":[
    "facet_fields",[
      "text_t",[
        "128x",1,
        "18xx",1,
        "1x",1,
        "2",2,
        "30",1,
        "60",1,
        "78xx",1,
        "82xx",1,
        "ab",2,
        "access",5,
        "account",1,
        "accounts",1,
...
"california",13,
...
"enron",9,
...
]]]}

QUESTION: This looks very close to what I want, yet why  
child.facet.limit=10&child.facet.mincount=5 are ignored?  How to get top 10 
most frequent? 


Thank you for your help in advance! 

-- 
Alisa Zhila

Reply via email to