Alright, based on https://issues.apache.org/jira/browse/SOLR-5743 I can assume that limit and mincount for the BlockJoin part stay an open issue for some time ... Therefore, the answer is no as of Solr 5.5.0.
Thanks to Mikhail Khludnev for working on the subject. >Вторник, 29 марта 2016, 14:38 -04:00 от Alisa Z. <prol...@mail.ru>: > >Mikhail, > >I totally see the point: the corresponding wiki page ( >https://cwiki.apache.org/confluence/display/solr/BlockJoin+Faceting ) does not >mention it and says it's an experimental feature. > >Is it correct that no additional options ( limit, mincount, etc.) can be set >anyhow? > >Or more specifically, is there any work-around to control the output of the >query at hand (maybe anything beyond faceting options): > >/bjqfacet?q={!parent%20which=type_s:doc}type_s:doc.enriched.text.keywords&child.facet.field=text_t&rows=0&fq={!parent%20which=type_s:doc}type_s:doc.userData%20%2BSubject_t:california&wt=json&indent=true >> >> >> >>RETURNS: >> >> >> >>{ >> >> "responseHeader":{ >> >> "status":0, >> >> "QTime":1}, >> >> "response":{"numFound":19,"start":0,"docs":[] >> >> }, >> >> "facet_counts":[ >> >> "facet_fields",[ >> >> "text_t",[ >> >> "128x",1, >> >> "18xx",1, >> >> "1x",1, >> >> "2",2, >> >> "30",1, >> >> "60",1, >> >> "78xx",1, >> >> "82xx",1, >> >> "ab",2, >> >> "access",5, >> >> "account",1, >> >> "accounts",1, >> >>... >> >>"california",13, >> >>... >> >>"enron",9, >> >>... >> >>]]]} >> >> > > >>Вторник, 29 марта 2016, 13:40 -04:00 от Mikhail Khludnev < >>mkhlud...@griddynamics.com >: >> >>Alisa, >> >>There is no such thing as child.facet.limit, etc >> >>On Tue, Mar 29, 2016 at 6:27 PM, Alisa Z. < prol...@mail.ru > wrote: >> >>> So the first issue eventually solved by adding facet: {top_terms_by_doc: >>> "unique(_root_)"} AND sorting the outer facet buckets by this faceting: >>> >>> curl http://localhost:8985/solr/enron_path_w_ts/query -d >>> 'q={!parent%20which="type_s:doc"}type_s:doc.userData%20%2BSubject_t:california&rows=0& >>> json.facet={ >>> filter_by_child_type :{ >>> type:query, >>> q:"type_s:doc.enriched.text.keywords", >>> domain: { blockChildren : "type_s:doc" }, >>> facet:{ >>> top_keywords_text : { >>> type: terms, >>> field: text_t, >>> limit: 10, >>> sort: "top_terms_by_doc desc", >>> facet: { >>> top_terms_by_doc: "unique(_root_)" >>> } >>> } >>> } >>> } >>> }' >>> >>> >>> The BlockJoin Faceting part is still open: I've tried all conventional >>> faceting parameters: facet.limit, child.facet.limit, f.text_t.facet.limit >>> ... nothing worked :( >>> >>> >>> >Понедельник, 28 марта 2016, 17:20 -04:00 от Alisa Z. < prol...@mail.ru >: >>> > >>> >Ok, so for the 1st question, I think I'm getting closer: adding facet: >>> {top_terms_by_doc: "unique(_root_)"} as indicated in >>> http://blog.griddynamics.com/search/label/~Mikhail%20Khludnev returns >>> correct counts. However, sorting is done by the upper faceting not by the >>> unique(_root_): >>> > >>> > >>> >curl http://localhost:8985/solr/my_collection /query -d >>> 'q={!parent%20which="type_s:doc"}type_s:doc.userData%20%2BSubject_t:california&rows=0& >>> >json.facet={ >>> > filter_by_child_type :{ >>> > type:query, >>> > q:"type_s:doc.enriched.text.keywords", >>> > domain: { blockChildren : "type_s:doc" }, >>> > facet:{ >>> > top_keywords_text : { >>> > type: terms, >>> > field: text_t, >>> > limit: 10, >>> > facet: { >>> > top_terms_by_doc: "unique(_root_)" >>> > } >>> > } >>> > } >>> > } >>> >}' >>> > >>> >RETURNS >>> > >>> >{ >>> > "responseHeader":{ >>> > "status":0, >>> > "QTime":25, >>> > "params":{ >>> > "q":"{!parent which=\"type_s:doc\"}type_s:doc.userData >>> +Subject_t:california", >>> > "json.facet":"{\n filter_by_child_type :{\n type:query,\n >>> q:\"type_s:doc.enriched.text.keywords\",\n domain: { blockChildren : >>> \"type_s:doc\" },\n facet:{\n top_keywords_text : {\n type: >>> terms,\n field: text_t,\n limit: 10,\n facet: >>> {\n top_terms_by_doc: \"unique(_root_)\"\n }\n >>> }\n }\n }\n}", >>> > "rows":"0"}}, >>> > "response":{"numFound":19,"start":0,"docs":[] >>> > }, >>> > "facets":{ >>> > "count":19, >>> > "filter_by_child_type":{ >>> > "count":686, >>> > "top_keywords_text":{ >>> > "buckets":[{ >>> > "val":"enron", >>> > "count":57, >>> > "top_terms_by_doc":9}, >>> > { >>> > "val":"california", >>> > "count":22, >>> > "top_terms_by_doc":13}, >>> > { >>> > "val":"power", >>> > "count":21, >>> > "top_terms_by_doc":7}, >>> > { >>> > "val":"rate", >>> > "count":15, >>> > "top_terms_by_doc":5}, >>> > { >>> > "val":"plan", >>> > "count":13, >>> > "top_terms_by_doc":3}, >>> > { >>> > "val":"hou", >>> > "count":12, >>> > "top_terms_by_doc":5}, >>> > { >>> > "val":"energy", >>> > "count":11, >>> > "top_terms_by_doc":5}, >>> > { >>> > "val":"na", >>> > "count":11, >>> > "top_terms_by_doc":5}, >>> > { >>> > "val":"mckinsey", >>> > "count":10, >>> > "top_terms_by_doc":1}, >>> > { >>> > "val":"socal", >>> > "count":10, >>> > "top_terms_by_doc":4}]}}}} >>> > >>> >Nice, but I want them to be ordered by "top_terms_by_doc" frequencies, >>> not by the "count" frequencies. >>> >Any suggestions? >>> > >>> >Thanks, >>> >Alisa >>> > >>> > >>> > >>> > >>> > >>> >>Понедельник, 28 марта 2016, 15:39 -04:00 от Alisa Z. < prol...@mail.ru >>> >: >>> >> >>> >>Hi all, >>> >> >>> >>I am trying to perform faceting of parent docs by nested document >>> fields. I've tried 2 approaches as in subject, yet in first the results are >>> not quite correct and in the 2nd I cannot get the query right. So I need >>> help on either of them and any explication or documentation or blogs on the >>> behavior is much appreciated. >>> >> >>> >>Verbally the query is as follows: "Find top 10 keywords for all >>> documents with "california" in email subject line" >>> >> >>> >>Here is the query with responses: >>> >> >>> >>==== Json Facet API ==== >>> >> >>> >>curl http://localhost:8985/solr/my_collection/query -d >>> 'q={!parent%20which="type_s:doc"}type_s:doc.userData%20%2BSubject_t:california&rows=0& >>> >>json.facet={ >>> >> filter_by_child_type :{ >>> >> type:query, >>> >> q:"type_s:doc.enriched.text.keywords", >>> >> domain: { blockChildren : "type_s:doc" }, >>> >> facet:{ >>> >> top_keywords_text : { >>> >> type: terms, >>> >> field: text_t, >>> >> limit: 10 >>> >> } >>> >> } >>> >> } >>> >>}' >>> >> >>> >>RETURNS: >>> >> >>> >>{ >>> >> "responseHeader":{ >>> >> "status":0, >>> >> "QTime":134, >>> >> "params":{ >>> >> "q":"{!parent which=\"type_s:doc\"}type_s:doc.userData >>> +Subject_t:california", >>> >> "json.facet":"{\n filter_by_child_type :{\n type:query,\n >>> q:\"type_s:doc.enriched.text.keywords\",\n domain: { blockChildren : >>> \"type_s:doc\" },\n facet:{\n top_keywords_text : {\n type: >>> terms,\n field: text_t,\n limit: 10\n }\n }\n }\n}", >>> >> "rows":"0"}}, >>> >> "response":{"numFound":19,"start":0,"docs":[] >>> >> }, >>> >> "facets":{ >>> >> "count":19, >>> >> "filter_by_child_type":{ >>> >> "count":686, >>> >> "top_keywords_text":{ >>> >> "buckets":[{ >>> >> "val":"enron", >>> >> "count":57}, >>> >> { >>> >> "val":"california", >>> >> "count":22}, >>> >> { >>> >> "val":"power", >>> >> "count":21}, >>> >> { >>> >> "val":"rate", >>> >> "count":15}, >>> >> { >>> >> "val":"plan", >>> >> "count":13}, >>> >> { >>> >> "val":"hou", >>> >> "count":12}, >>> >> { >>> >> "val":"energy", >>> >> "count":11}, >>> >> { >>> >> "val":"na", >>> >> "count":11}, >>> >> { >>> >> "val":"mckinsey", >>> >> "count":10}, >>> >> { >>> >> "val":"socal", >>> >> "count":10}]}}}} >>> >> >>> >> >>> >>QUESTION: where do the counts greater than 19 (the total number of the >>> top-level documents returned by the query) comes from? How to adjust the >>> query to facet only on the top-level documents (and consequently no count >>> should be greater than 19)? >>> >> >>> >> >>> >>===== BlockJoin Faceting ====== >>> >>Following the example on >>> https://cwiki.apache.org/confluence/display/solr/BlockJoin+Faceting , >>> I've tried this: >>> >> >>> >>> >>/bjqfacet?q={!parent%20which=type_s:doc}type_s:doc.enriched.text.keywords&child.facet.field=text_t&child.facet.limit=10&child.facet.mincount=5&rows=0&fq={!parent%20which=type_s:doc}type_s:doc.userData%20%2BSubject_t:california&wt=json&indent=true >>> >> >>> >>RETURNS: >>> >> >>> >>{ >>> >> "responseHeader":{ >>> >> "status":0, >>> >> "QTime":1}, >>> >> "response":{"numFound":19,"start":0,"docs":[] >>> >> }, >>> >> "facet_counts":[ >>> >> "facet_fields",[ >>> >> "text_t",[ >>> >> "128x",1, >>> >> "18xx",1, >>> >> "1x",1, >>> >> "2",2, >>> >> "30",1, >>> >> "60",1, >>> >> "78xx",1, >>> >> "82xx",1, >>> >> "ab",2, >>> >> "access",5, >>> >> "account",1, >>> >> "accounts",1, >>> >>... >>> >>"california",13, >>> >>... >>> >>"enron",9, >>> >>... >>> >>]]]} >>> >> >>> >>QUESTION: This looks very close to what I want, yet why >>> child.facet.limit=10&child.facet.mincount=5 are ignored? How to get top 10 >>> most frequent? >>> >> >>> >> >>> >>Thank you for your help in advance! >>> >> >>> >>-- >>> >>Alisa Zhila >>> >>> >> >> >>-- >>Sincerely yours >>Mikhail Khludnev >>Principal Engineer, >>Grid Dynamics >> >>< http://www.griddynamics.com > >>< mkhlud...@griddynamics.com > >