On Fri, Apr 22, 2016 at 12:26 PM, Alisa Z. <[email protected]> wrote:
> Hi Yonik,
>
> Thanks a lot for your response.
>
> I have discussed this with Mikhail Khludnev already and tried this
> suggestion. Here's what I've got:
>
>
>
> sentiment: positive
> author: Bob
> text: Great post about Solr
> 2.blog-posts.comments-id: 10735-23004 //this is a
> new field, field name is different on each level for each type, values are
> unique
> date: 2015-04-10T11:30:00Z
> path: 2.blog-posts.comments
> id: 10735-23004
> Query:
> curl http://localhost:8985/solr/solr_nesting_unique/query -d
> 'q=path:2.blog-posts.comments&rows=0&
> json.facet={
> filter_by_child_type :{
> type:query,
> q:"path:*comments*keywords",
> domain: { blockChildren : "path:2.blog-posts.comments" },
> facet:{
> top_entity_text : {
> type: terms,
> field: text,
> limit: 10,
> sort: "counts_by_comments desc",
> facet: {
> counts_by_comments: "unique (2.blog-posts.comments-id )"
> // changed
> }}}}}'
Something is wrong if you are getting 0 counts.
Lets try taking it piece-by-piece:
Step 1: q=path:2.blog-posts.comments
This finds level 2 documents
Step 2: domain: { blockChildren : "path:2.blog-posts.comments" }
This first maps to all of the children (level 3 and level4)
Step 3: q:"path:*comments*keywords"
This selects a subset of level3 and level4 documents with keywords
(Note, in the future this should be doable as an additional filter in
the domain spec, w/o an additional sub-facet level)
Step 4:
Facet on the text field of those level3 and level4 keyword docs. For
each bucket, also find the unique number of values in the
"2.blog-posts.comments-id" field on those documents.
"Without seeing what you indexed, my guess is that the issue is that
the "2.blog-posts.comments-id" field does not actually exist on those
level3 and level4 docs being faceted. The JSON Facet API doesn't
propagate field values up/down the nested stack yet. That's what
https://issues.apache.org/jira/browse/SOLR-8998 is mostly about.
-Yonik
>
> Response:
>
> "response":{"numFound":3,"start":0,"docs":[]
> },
> "facets":{
> "count":3,
> "filter_by_child_type":{
> "count":9,
> "top_entity_text":{
> "buckets":[{
> "val":"Elasticsearch",
> "count":2,
> "counts_by_comments":0},
> {
> "val":"Solr",
> "count":5,
> "counts_by_comments":0},
> {
> "val":"Solr 5.5",
> "count":1,
> "counts_by_comments":0},
> {
> "val":"feature",
> "count":1,
> "counts_by_comments":0}]}}}}
>
> So unless I messed something up... or the field name does not look
> "canonical" (but it was fast to generate and it is accepted in a normal query
> http://localhost:8985/solr/solr_nesting_unique/query?q=2.blog-posts.body-id
> :* )
>
> So I think that it's just a JSON facet API limitation...
>
> Best,
> --Alisa
>
>
>>Пятница, 22 апреля 2016, 9:55 -04:00 от Yonik Seeley <[email protected]>:
>>
>>Hi Alisa,
>>This was a bit too hard for me to grok on a first pass... then I saw
>>your related blog post which includes the actual sample data and makes
>>it more clear.
>>
>> More comments inline:
>>
>>On Wed, Apr 20, 2016 at 2:29 PM, Alisa Z. < [email protected] > wrote:
>>> Hi all,
>>>
>>> I have been stretching some SOLR's capabilities for nested documents
>>> handling and I've come up with the following issue...
>>>
>>> Let's say I have the following structure:
>>>
>>> {
>>> "blog-posts":{ //level 1
>>> "leaf-fields":[
>>> "date",
>>> "author"],
>>> "title":{ //level 2
>>> "leaf-fields":[ "text"],
>>> "keywords":{ //level 3
>>> "leaf-fields":[
>>> "text",
>>> "type"]
>>> }
>>> },
>>> "body":{ //level 2
>>> "leaf-fields":[ "text"],
>>> "keywords":{ //level 3
>>> "leaf-fields":[
>>> "text",
>>> "type"]
>>> }
>>> },
>>> "comments":{ //level 2
>>> "leaf-fields":[
>>> "date",
>>> "author",
>>> "text",
>>> "sentiment"
>>> ],
>>> "keywords":{ //level 3
>>> "leaf-fields":[
>>> "text",
>>> "type"]
>>> },
>>> "replies":{ //level 3
>>> "leaf-fields":[
>>> "date",
>>> "author",
>>> "text",
>>> "sentiment"],
>>> "keywords":{ //level 4
>>> "leaf-fields":[
>>> "text",
>>> "type"]
>>> }}}}}
>>>
>>>
>>> And I want to know the distribution of all readers' keywords (levels 3 and
>>> 4) by comments (level 2).
>>> In JSON Facet API I tried this:
>>>
>>> curl http://localhost:8983/solr/my_index/query -d
>>> 'q=path:2.blog-posts.comments&rows=0&
>>> json.facet={
>>> filter_by_child_type :{
>>> type:query,
>>> q:"path:*comments*keywords",
>>> domain: { blockChildren : "path:2.blog-posts.comments" },
>>> facet:{
>>> top_keywords : {
>>> type: terms,
>>> field: text,
>>> sort: "counts_by_comments desc",
>>> facet: {
>>> counts_by_comments: "unique(_root_)" // I suspect in should
>>> be a different field, not _root_, but would it be for an intermediate
>>> document?
>>> }}}}}'
>>>
>>> Which gives me the wrong results, it aggregates by posts, not by comments
>>> (it's a toy data set, so I know that the correct answer for "Solr" is 3
>>> when faceted by for comments)
>>
>>
>>Yeah, this type if thing isn't currently directly supported, but
>>SOLR-8998 should address that.
>>You can currently hack around it (for simple counts) using unique(),
>>as you've discovered, but you need a unique ID at the right level to
>>get the right count.
>>
>>_root_ is unique for blog posts, hence that's why you get numbers of
>>posts (as opposed to numbers of level-2 comments).
>>You could add a "level2_comment_id" field to the level 2 commends and
>>it's children, and then use unique() on that.
>>
>>-Yonik
>>
>>
>>> {
>>> "response":{"numFound":3,"start":0,"docs":[]
>>> },
>>> "facets":{
>>> "count":3,
>>> "filter_by_child_type":{
>>> "count":9,
>>> "top_keywords":{
>>> "buckets":[{
>>> "val":"Elasticsearch",
>>> "count":2,
>>> "counts_by_comments":2},
>>> {
>>> "val":"Solr",
>>> "count":5,
>>> "counts_by_comments":2}, //here the count by
>>> "comments" should be 3
>>> {
>>> "val":"Solr 5.5",
>>> "count":1,
>>> "counts_by_comments":1},
>>> {
>>> "val":"feature",
>>> "count":1,
>>> "counts_by_comments":1}]}}}}
>>>
>>>
>>> Am I writing the query wrong?
>>>
>>>
>>> By the way, Block Join Faceting works fine for this:
>>> bjqfacet?q={!parent%20which=path:2.blog-posts.comments}path:*.comments*keywords&rows=0&facet=true&child.facet.field=text&wt=json&indent=true
>>>
>>> {
>>> "response":{"numFound":3,"start":0,"docs":[]
>>> },
>>> "facet_counts":{
>>> "facet_queries":{},
>>> "facet_fields":{
>>> "text":[
>>> "Elasticsearch",2,
>>> "Solr",3, //correct result
>>> "Solr 5.5",1,
>>> "feature",1]},
>>> "facet_dates":{},
>>> "facet_ranges":{},
>>> "facet_intervals":{},
>>> "facet_heatmaps":{}}}
>>>
>>> But we've already discussed that it returns too much stuff: no way to put
>>> limits or order by counts :( That's why I want to see whether it's posible
>>> to make JSON Facet API straight.
>>>
>>> Thank you in advance!
>>>
>>> --
>>> Alisa Zhila
>