Re: Faceting a multi valued field

Steve Fatula Mon, 07 Nov 2011 21:49:26 -0800

From: Chris Hostetter <hossman_luc...@fucit.org>
>To: Steve Fatula <compconsult...@yahoo.com>
>Cc: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
>Sent: Monday, November 7, 2011 7:17 PM
>Subject: Re: Faceting a multi valued field
> 
>: A > B > C > D > E
>: Z > C > D > E
>: Z > C > F > G > H > E
>: Y > G > H > E
>: 
>: Now, I want to get a count of the products in the children of C, AND, 
>: each of their children (so, 2 levels, i.e. D, D > E, F, F > G). Note, 
>
>Are these letters just "labels" for categories and the individual labels 
>are frequently re-used to describe differnet concrete categories, or are 
>you genuinely saying that a single category (labled "C") has multiple 
>parent categories (B, and Z) and depending on *which* parent you are 
>considering at any given time, it has differend child categories (ie: C 
>has direct children D and F when viewed from parent Z, but when viewed 
>from parent B, C's only direct child is D)?
>
>Each letter represents a category. So, I am browsing category C and it is 100% 
>irrelevant which parent of C as that is not considered at all. I am on C, that 
>is all that matters. Think of a product with multiple uses. An product might 
>be used in RVs. But, it's also used in trucks, it's also used with solar 
>panels, and it's also used in consumer electronics. It can be in many 
>different places within a tree. That is perfectly normal and most all of the 
>large websites do such a thing. So, just an example. It's not messed up. You 
>are correct in that I missed the other children in parts of the tree example, 
>that IS messed up, sorry! So, C ALWAYS has the same children. But it also may 
>have many many parents at many different tree levels. So, this is much 
>different than the examples in the wiki as level doesn't work.



>If it's the former (just an issue of reusing labels) then you can probably 
>make your life a lot simpler by choosing unique identifiers for every 
>category in the hierachy (regardless o the label) and indexing those.
>
>Category C is category C, period. It is the same identifier as it is the same 
>category with the same product memberships. It would make little sense to add 
>products to many more categories and duplicate all that data. That's not how a 
>tree is typically organized. But still, this is all tangential. I assume you 
>mean category C would have a different id depending on its parent, not 
>something we can do. Unless I mis-understand


>: The reality is products are in C. It is meaningless what parent category 
>: they have, and thus what level. So, what is a good way to tackle this 
>: using Solr?
>
>from the standpoint of a single product document, you may not care what 
>the "parent" categories are for each category the product, but if your 
>goal is to get facet counts for every "child" category of a specified 
>"parent" then it absolutely matters what the parent categories are -- the 
>easiest way i know of to do that is to have a field containing each 
>of the categories the document is in expressed as a "path", and then use 
>facet.prefix to limit the constraints considered to terms that match the 
>"path" of the parent category you are interested in.  
>
>I don't follow that. The parent of C is, well, many categories, possibly 4 in 
>the power inverter example, in some cases, many more. I cannot therefore use a 
>full path up to that point as I do know know all of the full paths getting to 
>C, I just know I am on C. This would come out abysmally slow to calculate all 
>full paths to a given category for every product and index that. Just use one 
>you say, well, that then messed up higher level queries that now are not aware 
>of products in other parts of the tree via different parents. I also already 
>know the children of the category I am in, with or without Solr. So, since I 
>know that, was hoping for a simple single query using some sort of Solr index 
>field that would allow a query to get the data I need.


>
>since you also said 
>you only want the categories that are immediate children of the current 
>category, encoding the "level" of the category at the begining of it's 
>path makes this possible using facet.prefix as well 
>
>Not when a given category exists at many levels. That's the simplistic Solr 
>wiki example. 


>-- if you *only* ever 
>want constraint counts for the immediate child categories, you can the 
>level and most of the path and just index the "${parent_cat_id}:${cat_id}" 
>tuples for every $cat_id the product is in, and tuples and use 
>"${cat_id}:" as your facet.prefix.
>
>
>Yes, and if you go back to the original message which didn't explain the 
>structure as you had mentioned, that's what I am doing. But, I also said I 
>needed the children of the current category, and their children as well. So, 
>getting the children is one Solr call as we do now. Now, that may return say 
>25 subcategories. Now, for EACH of those, I need their children and counts 
>(so, this is 2 levels of subcategories and counts, no more, no less. I cannot 
>find a way to do that query except by 25 more calls, which we don't want to 
>do. As I cannot use multiple facet.prefix since the syntax does not support 
>that unfortunately. Does that make more sense?

That is the crux of the matter. Hierarchical data where a given node might 
exist in many places within the data. I have not seen an example for this. It 
doesn't seem obvious how to handle. At least to me. But maybe it does to you, 
or someone else, which is why I am asking. Perhaps the problem is not solvable 
with Solr.

Re: Faceting a multi valued field

Reply via email to