So the first issue eventually solved by adding facet: {top_terms_by_doc: "unique(_root_)"} AND sorting the outer facet buckets by this faceting:
curl http://localhost:8985/solr/enron_path_w_ts/query -d 'q={!parent%20which="type_s:doc"}type_s:doc.userData%20%2BSubject_t:california&rows=0& json.facet={ filter_by_child_type :{ type:query, q:"type_s:doc.enriched.text.keywords", domain: { blockChildren : "type_s:doc" }, facet:{ top_keywords_text : { type: terms, field: text_t, limit: 10, sort: "top_terms_by_doc desc", facet: { top_terms_by_doc: "unique(_root_)" } } } } }' The BlockJoin Faceting part is still open: I've tried all conventional faceting parameters: facet.limit, child.facet.limit, f.text_t.facet.limit ... nothing worked :( >Понедельник, 28 марта 2016, 17:20 -04:00 от Alisa Z. <prol...@mail.ru>: > >Ok, so for the 1st question, I think I'm getting closer: adding facet: >{top_terms_by_doc: "unique(_root_)"} as indicated in >http://blog.griddynamics.com/search/label/~Mikhail%20Khludnev returns correct >counts. However, sorting is done by the upper faceting not by the >unique(_root_): > > >curl http://localhost:8985/solr/my_collection /query -d >'q={!parent%20which="type_s:doc"}type_s:doc.userData%20%2BSubject_t:california&rows=0& >json.facet={ > filter_by_child_type :{ > type:query, > q:"type_s:doc.enriched.text.keywords", > domain: { blockChildren : "type_s:doc" }, > facet:{ > top_keywords_text : { > type: terms, > field: text_t, > limit: 10, > facet: { > top_terms_by_doc: "unique(_root_)" > } > } > } > } >}' > >RETURNS > >{ > "responseHeader":{ > "status":0, > "QTime":25, > "params":{ > "q":"{!parent which=\"type_s:doc\"}type_s:doc.userData >+Subject_t:california", > "json.facet":"{\n filter_by_child_type :{\n type:query,\n >q:\"type_s:doc.enriched.text.keywords\",\n domain: { blockChildren : >\"type_s:doc\" },\n facet:{\n top_keywords_text : {\n type: >terms,\n field: text_t,\n limit: 10,\n facet: {\n > top_terms_by_doc: \"unique(_root_)\"\n }\n }\n }\n }\n}", > "rows":"0"}}, > "response":{"numFound":19,"start":0,"docs":[] > }, > "facets":{ > "count":19, > "filter_by_child_type":{ > "count":686, > "top_keywords_text":{ > "buckets":[{ > "val":"enron", > "count":57, > "top_terms_by_doc":9}, > { > "val":"california", > "count":22, > "top_terms_by_doc":13}, > { > "val":"power", > "count":21, > "top_terms_by_doc":7}, > { > "val":"rate", > "count":15, > "top_terms_by_doc":5}, > { > "val":"plan", > "count":13, > "top_terms_by_doc":3}, > { > "val":"hou", > "count":12, > "top_terms_by_doc":5}, > { > "val":"energy", > "count":11, > "top_terms_by_doc":5}, > { > "val":"na", > "count":11, > "top_terms_by_doc":5}, > { > "val":"mckinsey", > "count":10, > "top_terms_by_doc":1}, > { > "val":"socal", > "count":10, > "top_terms_by_doc":4}]}}}} > >Nice, but I want them to be ordered by "top_terms_by_doc" frequencies, not by >the "count" frequencies. >Any suggestions? > >Thanks, >Alisa > > > > > >>Понедельник, 28 марта 2016, 15:39 -04:00 от Alisa Z. < prol...@mail.ru >: >> >>Hi all, >> >>I am trying to perform faceting of parent docs by nested document fields. >>I've tried 2 approaches as in subject, yet in first the results are not quite >>correct and in the 2nd I cannot get the query right. So I need help on either >>of them and any explication or documentation or blogs on the behavior is much >>appreciated. >> >>Verbally the query is as follows: "Find top 10 keywords for all documents >>with "california" in email subject line" >> >>Here is the query with responses: >> >>==== Json Facet API ==== >> >>curl http://localhost:8985/solr/my_collection/query -d >>'q={!parent%20which="type_s:doc"}type_s:doc.userData%20%2BSubject_t:california&rows=0& >>json.facet={ >> filter_by_child_type :{ >> type:query, >> q:"type_s:doc.enriched.text.keywords", >> domain: { blockChildren : "type_s:doc" }, >> facet:{ >> top_keywords_text : { >> type: terms, >> field: text_t, >> limit: 10 >> } >> } >> } >>}' >> >>RETURNS: >> >>{ >> "responseHeader":{ >> "status":0, >> "QTime":134, >> "params":{ >> "q":"{!parent which=\"type_s:doc\"}type_s:doc.userData >>+Subject_t:california", >> "json.facet":"{\n filter_by_child_type :{\n type:query,\n >>q:\"type_s:doc.enriched.text.keywords\",\n domain: { blockChildren : >>\"type_s:doc\" },\n facet:{\n top_keywords_text : {\n type: >>terms,\n field: text_t,\n limit: 10\n }\n }\n }\n}", >> "rows":"0"}}, >> "response":{"numFound":19,"start":0,"docs":[] >> }, >> "facets":{ >> "count":19, >> "filter_by_child_type":{ >> "count":686, >> "top_keywords_text":{ >> "buckets":[{ >> "val":"enron", >> "count":57}, >> { >> "val":"california", >> "count":22}, >> { >> "val":"power", >> "count":21}, >> { >> "val":"rate", >> "count":15}, >> { >> "val":"plan", >> "count":13}, >> { >> "val":"hou", >> "count":12}, >> { >> "val":"energy", >> "count":11}, >> { >> "val":"na", >> "count":11}, >> { >> "val":"mckinsey", >> "count":10}, >> { >> "val":"socal", >> "count":10}]}}}} >> >> >>QUESTION: where do the counts greater than 19 (the total number of the >>top-level documents returned by the query) comes from? How to adjust the >>query to facet only on the top-level documents (and consequently no count >>should be greater than 19)? >> >> >>===== BlockJoin Faceting ====== >>Following the example on >>https://cwiki.apache.org/confluence/display/solr/BlockJoin+Faceting , I've >>tried this: >> >>/bjqfacet?q={!parent%20which=type_s:doc}type_s:doc.enriched.text.keywords&child.facet.field=text_t&child.facet.limit=10&child.facet.mincount=5&rows=0&fq={!parent%20which=type_s:doc}type_s:doc.userData%20%2BSubject_t:california&wt=json&indent=true >> >>RETURNS: >> >>{ >> "responseHeader":{ >> "status":0, >> "QTime":1}, >> "response":{"numFound":19,"start":0,"docs":[] >> }, >> "facet_counts":[ >> "facet_fields",[ >> "text_t",[ >> "128x",1, >> "18xx",1, >> "1x",1, >> "2",2, >> "30",1, >> "60",1, >> "78xx",1, >> "82xx",1, >> "ab",2, >> "access",5, >> "account",1, >> "accounts",1, >>... >>"california",13, >>... >>"enron",9, >>... >>]]]} >> >>QUESTION: This looks very close to what I want, yet why >>child.facet.limit=10&child.facet.mincount=5 are ignored? How to get top 10 >>most frequent? >> >> >>Thank you for your help in advance! >> >>-- >>Alisa Zhila