Re[4]: [nesting] JSON Facet API vs. BlockJoin Faceting: need help on queries (Facet API facets by wrong doc level VS. BlockJoin Faceting does not return top 10 most frequent)

Alisa Z . Tue, 29 Mar 2016 11:39:47 -0700

 Mikhail, 

I totally see the point: the corresponding wiki page ( 
https://cwiki.apache.org/confluence/display/solr/BlockJoin+Faceting ) does not 
mention it and says it's an experimental feature.


Is it correct that no additional options ( limit, mincount, etc.) can  be set 
anyhow?  

Or more specifically, is there any work-around to control the output of the 
query at hand (maybe anything beyond faceting options): 

/bjqfacet?q={!parent%20which=type_s:doc}type_s:doc.enriched.text.keywords&child.facet.field=text_t&rows=0&fq={!parent%20which=type_s:doc}type_s:doc.userData%20%2BSubject_t:california&wt=json&indent=true
> >>
> >>RETURNS:
> >>
> >>{
> >> "responseHeader":{
> >> "status":0,
> >> "QTime":1},
> >> "response":{"numFound":19,"start":0,"docs":[]
> >> },
> >> "facet_counts":[
> >> "facet_fields",[
> >> "text_t",[
> >> "128x",1,
> >> "18xx",1,
> >> "1x",1,
> >> "2",2,
> >> "30",1,
> >> "60",1,
> >> "78xx",1,
> >> "82xx",1,
> >> "ab",2,
> >> "access",5,
> >> "account",1,
> >> "accounts",1,
> >>...
> >>"california",13,
> >>...
> >>"enron",9,
> >>...
> >>]]]}
> >>  


>Вторник, 29 марта 2016, 13:40 -04:00 от Mikhail Khludnev 
><mkhlud...@griddynamics.com>:
>
>Alisa,
>
>There is no such thing as child.facet.limit, etc
>
>On Tue, Mar 29, 2016 at 6:27 PM, Alisa Z. < prol...@mail.ru > wrote:
>
>>  So the first issue eventually solved by adding facet: {top_terms_by_doc:
>> "unique(_root_)"} AND sorting the outer facet buckets by this faceting:
>>
>> curl http://localhost:8985/solr/enron_path_w_ts/query -d
>> 'q={!parent%20which="type_s:doc"}type_s:doc.userData%20%2BSubject_t:california&rows=0&
>> json.facet={
>>   filter_by_child_type :{
>>     type:query,
>>     q:"type_s:doc.enriched.text.keywords",
>>     domain: { blockChildren : "type_s:doc" },
>>     facet:{
>>       top_keywords_text : {
>>         type: terms,
>>         field: text_t,
>>         limit: 10,
>>         sort: "top_terms_by_doc desc",
>>          facet: {
>>            top_terms_by_doc: "unique(_root_)"
>>          }
>>       }
>>     }
>>   }
>> }'
>>
>>
>> The  BlockJoin Faceting  part is still open:  I've tried all conventional
>> faceting parameters:  facet.limit, child.facet.limit, f.text_t.facet.limit
>> ... nothing worked :(
>>
>>
>> >Понедельник, 28 марта 2016, 17:20 -04:00 от Alisa Z. < prol...@mail.ru >:
>> >
>> >Ok, so for the 1st question, I think I'm getting closer:  adding  facet:
>> {top_terms_by_doc: "unique(_root_)"}  as indicated in
>>  http://blog.griddynamics.com/search/label/~Mikhail%20Khludnev returns
>> correct counts. However, sorting is done by the upper faceting not by the
>> unique(_root_):
>> >
>> >
>> >curl  http://localhost:8985/solr/my_collection /query -d
>> 'q={!parent%20which="type_s:doc"}type_s:doc.userData%20%2BSubject_t:california&rows=0&
>> >json.facet={
>> >  filter_by_child_type :{
>> >    type:query,
>> >    q:"type_s:doc.enriched.text.keywords",
>> >    domain: { blockChildren : "type_s:doc" },
>> >    facet:{
>> >      top_keywords_text : {
>> >        type: terms,
>> >        field: text_t,
>> >        limit: 10,
>> >        facet: {
>> >           top_terms_by_doc: "unique(_root_)"
>> >         }
>> >      }
>> >    }
>> >  }
>> >}'
>> >
>> >RETURNS
>> >
>> >{
>> >  "responseHeader":{
>> >    "status":0,
>> >    "QTime":25,
>> >    "params":{
>> >      "q":"{!parent which=\"type_s:doc\"}type_s:doc.userData
>> +Subject_t:california",
>> >      "json.facet":"{\n  filter_by_child_type :{\n    type:query,\n
>> q:\"type_s:doc.enriched.text.keywords\",\n    domain: { blockChildren :
>> \"type_s:doc\" },\n    facet:{\n      top_keywords_text : {\n        type:
>> terms,\n        field: text_t,\n        limit: 10,\n        facet:
>> {\n           top_terms_by_doc: \"unique(_root_)\"\n         }\n
>> }\n    }\n  }\n}",
>> >      "rows":"0"}},
>> >  "response":{"numFound":19,"start":0,"docs":[]
>> >  },
>> >  "facets":{
>> >    "count":19,
>> >    "filter_by_child_type":{
>> >      "count":686,
>> >      "top_keywords_text":{
>> >        "buckets":[{
>> >            "val":"enron",
>> >            "count":57,
>> >            "top_terms_by_doc":9},
>> >          {
>> >            "val":"california",
>> >            "count":22,
>> >            "top_terms_by_doc":13},
>> >          {
>> >            "val":"power",
>> >            "count":21,
>> >            "top_terms_by_doc":7},
>> >          {
>> >            "val":"rate",
>> >            "count":15,
>> >            "top_terms_by_doc":5},
>> >          {
>> >            "val":"plan",
>> >            "count":13,
>> >            "top_terms_by_doc":3},
>> >          {
>> >            "val":"hou",
>> >            "count":12,
>> >            "top_terms_by_doc":5},
>> >          {
>> >            "val":"energy",
>> >            "count":11,
>> >            "top_terms_by_doc":5},
>> >          {
>> >            "val":"na",
>> >            "count":11,
>> >            "top_terms_by_doc":5},
>> >          {
>> >            "val":"mckinsey",
>> >            "count":10,
>> >            "top_terms_by_doc":1},
>> >          {
>> >            "val":"socal",
>> >            "count":10,
>> >            "top_terms_by_doc":4}]}}}}
>> >
>> >Nice, but I want them to be ordered by "top_terms_by_doc" frequencies,
>> not by the "count" frequencies.
>> >Any suggestions?
>> >
>> >Thanks,
>> >Alisa
>> >
>> >
>> >
>> >
>> >
>> >>Понедельник, 28 марта 2016, 15:39 -04:00 от Alisa Z. <  prol...@mail.ru
>> >:
>> >>
>> >>Hi all,
>> >>
>> >>I am trying to perform faceting of parent docs by nested document
>> fields. I've tried 2 approaches as in subject, yet in first the results are
>> not quite correct and in the 2nd I cannot get the query right. So I need
>> help on either of them and any explication or documentation or blogs on the
>> behavior is much appreciated.
>> >>
>> >>Verbally the query is as follows: "Find top 10 keywords for all
>> documents with "california" in email subject line"
>> >>
>> >>Here is the query with responses:
>> >>
>> >>==== Json Facet API ====
>> >>
>> >>curl http://localhost:8985/solr/my_collection/query -d
>> 'q={!parent%20which="type_s:doc"}type_s:doc.userData%20%2BSubject_t:california&rows=0&
>> >>json.facet={
>> >>  filter_by_child_type :{
>> >>    type:query,
>> >>    q:"type_s:doc.enriched.text.keywords",
>> >>    domain: { blockChildren : "type_s:doc" },
>> >>    facet:{
>> >>      top_keywords_text : {
>> >>        type: terms,
>> >>        field: text_t,
>> >>        limit: 10
>> >>      }
>> >>    }
>> >>  }
>> >>}'
>> >>
>> >>RETURNS:
>> >>
>> >>{
>> >>  "responseHeader":{
>> >>    "status":0,
>> >>    "QTime":134,
>> >>    "params":{
>> >>      "q":"{!parent which=\"type_s:doc\"}type_s:doc.userData
>> +Subject_t:california",
>> >>      "json.facet":"{\n  filter_by_child_type :{\n    type:query,\n
>> q:\"type_s:doc.enriched.text.keywords\",\n    domain: { blockChildren :
>> \"type_s:doc\" },\n    facet:{\n      top_keywords_text : {\n        type:
>> terms,\n        field: text_t,\n        limit: 10\n      }\n    }\n  }\n}",
>> >>      "rows":"0"}},
>> >>  "response":{"numFound":19,"start":0,"docs":[]
>> >>  },
>> >>  "facets":{
>> >>    "count":19,
>> >>    "filter_by_child_type":{
>> >>      "count":686,
>> >>      "top_keywords_text":{
>> >>        "buckets":[{
>> >>            "val":"enron",
>> >>            "count":57},
>> >>          {
>> >>            "val":"california",
>> >>            "count":22},
>> >>          {
>> >>            "val":"power",
>> >>            "count":21},
>> >>          {
>> >>            "val":"rate",
>> >>            "count":15},
>> >>          {
>> >>            "val":"plan",
>> >>            "count":13},
>> >>          {
>> >>            "val":"hou",
>> >>            "count":12},
>> >>          {
>> >>            "val":"energy",
>> >>            "count":11},
>> >>          {
>> >>            "val":"na",
>> >>            "count":11},
>> >>          {
>> >>            "val":"mckinsey",
>> >>            "count":10},
>> >>          {
>> >>            "val":"socal",
>> >>            "count":10}]}}}}
>> >>
>> >>
>> >>QUESTION:  where do the counts greater than 19 (the total number of the
>> top-level documents returned by the query) comes from?  How to adjust the
>> query to facet only on the top-level documents (and consequently no count
>> should be greater than 19)?
>> >>
>> >>
>> >>===== BlockJoin Faceting ======
>> >>Following the example on
>>  https://cwiki.apache.org/confluence/display/solr/BlockJoin+Faceting ,
>> I've tried this:
>> >>
>>
>> >>/bjqfacet?q={!parent%20which=type_s:doc}type_s:doc.enriched.text.keywords&child.facet.field=text_t&child.facet.limit=10&child.facet.mincount=5&rows=0&fq={!parent%20which=type_s:doc}type_s:doc.userData%20%2BSubject_t:california&wt=json&indent=true
>> >>
>> >>RETURNS:
>> >>
>> >>{
>> >>  "responseHeader":{
>> >>    "status":0,
>> >>    "QTime":1},
>> >>  "response":{"numFound":19,"start":0,"docs":[]
>> >>  },
>> >>  "facet_counts":[
>> >>    "facet_fields",[
>> >>      "text_t",[
>> >>        "128x",1,
>> >>        "18xx",1,
>> >>        "1x",1,
>> >>        "2",2,
>> >>        "30",1,
>> >>        "60",1,
>> >>        "78xx",1,
>> >>        "82xx",1,
>> >>        "ab",2,
>> >>        "access",5,
>> >>        "account",1,
>> >>        "accounts",1,
>> >>...
>> >>"california",13,
>> >>...
>> >>"enron",9,
>> >>...
>> >>]]]}
>> >>
>> >>QUESTION: This looks very close to what I want, yet why
>> child.facet.limit=10&child.facet.mincount=5 are ignored?  How to get top 10
>> most frequent?
>> >>
>> >>
>> >>Thank you for your help in advance!
>> >>
>> >>--
>> >>Alisa Zhila
>>
>>
>
>
>-- 
>Sincerely yours
>Mikhail Khludnev
>Principal Engineer,
>Grid Dynamics
>
>< http://www.griddynamics.com >
>< mkhlud...@griddynamics.com >

Re[4]: [nesting] JSON Facet API vs. BlockJoin Faceting: need help on queries (Facet API facets by wrong doc level VS. BlockJoin Faceting does not return top 10 most frequent)

Reply via email to