Alisa, There is no such thing as child.facet.limit, etc
On Tue, Mar 29, 2016 at 6:27 PM, Alisa Z. <prol...@mail.ru> wrote: > So the first issue eventually solved by adding facet: {top_terms_by_doc: > "unique(_root_)"} AND sorting the outer facet buckets by this faceting: > > curl http://localhost:8985/solr/enron_path_w_ts/query -d > 'q={!parent%20which="type_s:doc"}type_s:doc.userData%20%2BSubject_t:california&rows=0& > json.facet={ > filter_by_child_type :{ > type:query, > q:"type_s:doc.enriched.text.keywords", > domain: { blockChildren : "type_s:doc" }, > facet:{ > top_keywords_text : { > type: terms, > field: text_t, > limit: 10, > sort: "top_terms_by_doc desc", > facet: { > top_terms_by_doc: "unique(_root_)" > } > } > } > } > }' > > > The BlockJoin Faceting part is still open: I've tried all conventional > faceting parameters: facet.limit, child.facet.limit, f.text_t.facet.limit > ... nothing worked :( > > > >Понедельник, 28 марта 2016, 17:20 -04:00 от Alisa Z. <prol...@mail.ru>: > > > >Ok, so for the 1st question, I think I'm getting closer: adding facet: > {top_terms_by_doc: "unique(_root_)"} as indicated in > http://blog.griddynamics.com/search/label/~Mikhail%20Khludnev returns > correct counts. However, sorting is done by the upper faceting not by the > unique(_root_): > > > > > >curl http://localhost:8985/solr/my_collection /query -d > 'q={!parent%20which="type_s:doc"}type_s:doc.userData%20%2BSubject_t:california&rows=0& > >json.facet={ > > filter_by_child_type :{ > > type:query, > > q:"type_s:doc.enriched.text.keywords", > > domain: { blockChildren : "type_s:doc" }, > > facet:{ > > top_keywords_text : { > > type: terms, > > field: text_t, > > limit: 10, > > facet: { > > top_terms_by_doc: "unique(_root_)" > > } > > } > > } > > } > >}' > > > >RETURNS > > > >{ > > "responseHeader":{ > > "status":0, > > "QTime":25, > > "params":{ > > "q":"{!parent which=\"type_s:doc\"}type_s:doc.userData > +Subject_t:california", > > "json.facet":"{\n filter_by_child_type :{\n type:query,\n > q:\"type_s:doc.enriched.text.keywords\",\n domain: { blockChildren : > \"type_s:doc\" },\n facet:{\n top_keywords_text : {\n type: > terms,\n field: text_t,\n limit: 10,\n facet: > {\n top_terms_by_doc: \"unique(_root_)\"\n }\n > }\n }\n }\n}", > > "rows":"0"}}, > > "response":{"numFound":19,"start":0,"docs":[] > > }, > > "facets":{ > > "count":19, > > "filter_by_child_type":{ > > "count":686, > > "top_keywords_text":{ > > "buckets":[{ > > "val":"enron", > > "count":57, > > "top_terms_by_doc":9}, > > { > > "val":"california", > > "count":22, > > "top_terms_by_doc":13}, > > { > > "val":"power", > > "count":21, > > "top_terms_by_doc":7}, > > { > > "val":"rate", > > "count":15, > > "top_terms_by_doc":5}, > > { > > "val":"plan", > > "count":13, > > "top_terms_by_doc":3}, > > { > > "val":"hou", > > "count":12, > > "top_terms_by_doc":5}, > > { > > "val":"energy", > > "count":11, > > "top_terms_by_doc":5}, > > { > > "val":"na", > > "count":11, > > "top_terms_by_doc":5}, > > { > > "val":"mckinsey", > > "count":10, > > "top_terms_by_doc":1}, > > { > > "val":"socal", > > "count":10, > > "top_terms_by_doc":4}]}}}} > > > >Nice, but I want them to be ordered by "top_terms_by_doc" frequencies, > not by the "count" frequencies. > >Any suggestions? > > > >Thanks, > >Alisa > > > > > > > > > > > >>Понедельник, 28 марта 2016, 15:39 -04:00 от Alisa Z. < prol...@mail.ru > >: > >> > >>Hi all, > >> > >>I am trying to perform faceting of parent docs by nested document > fields. I've tried 2 approaches as in subject, yet in first the results are > not quite correct and in the 2nd I cannot get the query right. So I need > help on either of them and any explication or documentation or blogs on the > behavior is much appreciated. > >> > >>Verbally the query is as follows: "Find top 10 keywords for all > documents with "california" in email subject line" > >> > >>Here is the query with responses: > >> > >>==== Json Facet API ==== > >> > >>curl http://localhost:8985/solr/my_collection/query -d > 'q={!parent%20which="type_s:doc"}type_s:doc.userData%20%2BSubject_t:california&rows=0& > >>json.facet={ > >> filter_by_child_type :{ > >> type:query, > >> q:"type_s:doc.enriched.text.keywords", > >> domain: { blockChildren : "type_s:doc" }, > >> facet:{ > >> top_keywords_text : { > >> type: terms, > >> field: text_t, > >> limit: 10 > >> } > >> } > >> } > >>}' > >> > >>RETURNS: > >> > >>{ > >> "responseHeader":{ > >> "status":0, > >> "QTime":134, > >> "params":{ > >> "q":"{!parent which=\"type_s:doc\"}type_s:doc.userData > +Subject_t:california", > >> "json.facet":"{\n filter_by_child_type :{\n type:query,\n > q:\"type_s:doc.enriched.text.keywords\",\n domain: { blockChildren : > \"type_s:doc\" },\n facet:{\n top_keywords_text : {\n type: > terms,\n field: text_t,\n limit: 10\n }\n }\n }\n}", > >> "rows":"0"}}, > >> "response":{"numFound":19,"start":0,"docs":[] > >> }, > >> "facets":{ > >> "count":19, > >> "filter_by_child_type":{ > >> "count":686, > >> "top_keywords_text":{ > >> "buckets":[{ > >> "val":"enron", > >> "count":57}, > >> { > >> "val":"california", > >> "count":22}, > >> { > >> "val":"power", > >> "count":21}, > >> { > >> "val":"rate", > >> "count":15}, > >> { > >> "val":"plan", > >> "count":13}, > >> { > >> "val":"hou", > >> "count":12}, > >> { > >> "val":"energy", > >> "count":11}, > >> { > >> "val":"na", > >> "count":11}, > >> { > >> "val":"mckinsey", > >> "count":10}, > >> { > >> "val":"socal", > >> "count":10}]}}}} > >> > >> > >>QUESTION: where do the counts greater than 19 (the total number of the > top-level documents returned by the query) comes from? How to adjust the > query to facet only on the top-level documents (and consequently no count > should be greater than 19)? > >> > >> > >>===== BlockJoin Faceting ====== > >>Following the example on > https://cwiki.apache.org/confluence/display/solr/BlockJoin+Faceting , > I've tried this: > >> > > >>/bjqfacet?q={!parent%20which=type_s:doc}type_s:doc.enriched.text.keywords&child.facet.field=text_t&child.facet.limit=10&child.facet.mincount=5&rows=0&fq={!parent%20which=type_s:doc}type_s:doc.userData%20%2BSubject_t:california&wt=json&indent=true > >> > >>RETURNS: > >> > >>{ > >> "responseHeader":{ > >> "status":0, > >> "QTime":1}, > >> "response":{"numFound":19,"start":0,"docs":[] > >> }, > >> "facet_counts":[ > >> "facet_fields",[ > >> "text_t",[ > >> "128x",1, > >> "18xx",1, > >> "1x",1, > >> "2",2, > >> "30",1, > >> "60",1, > >> "78xx",1, > >> "82xx",1, > >> "ab",2, > >> "access",5, > >> "account",1, > >> "accounts",1, > >>... > >>"california",13, > >>... > >>"enron",9, > >>... > >>]]]} > >> > >>QUESTION: This looks very close to what I want, yet why > child.facet.limit=10&child.facet.mincount=5 are ignored? How to get top 10 > most frequent? > >> > >> > >>Thank you for your help in advance! > >> > >>-- > >>Alisa Zhila > > -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics <http://www.griddynamics.com> <mkhlud...@griddynamics.com>