Re: Faceting a multi valued field

Steve Fatula Mon, 07 Nov 2011 16:41:23 -0800

From: Chris Hostetter <hossman_luc...@fucit.org>
>To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>; Steve Fatula 
><compconsult...@yahoo.com>
>Sent: Monday, November 7, 2011 5:42 PM
>Subject: Re: Faceting a multi valued field
>
>
>how are you modeling the tree nature of your cateory taxonomy when you 
>index the terms?  if you index each category id as the breadcrumb of all 
>it's ancestor categories and the "depth" of the category in the tree, you 
>can use facet.prefix to only see the children of a specified category.  
>
>
Someone always wants to understand the full use case. :-) I do understand why, 
but, sometimes said use case is extremely complex with dozens and dozens of 
search requirements. I was trying to limit the explanation and was hoping 
someone could just answer the question as is. However, I will improve upon my 
question... To provide slightly more detail, the examples given on the Solr 
hierarchical wiki page are extremely simplistic. Consider the following 
taxonomy:


A > B > C > D > E
Z > C > D > E
Z > C > F > G > H > E
Y > G > H > E

Now, I want to get a count of the products in the children of C, AND, each of 
their children (so, 2 levels, i.e. D, D > E, F, F > G). Note, unlike the wiki 
examples, C exists at multiple levels, possibly, lots of multiples. Each 
product (indexed document) can be a member of multiple categories. So, if we 
search for products of category C, no matter I believe how it is indexed, if a 
given product is a member of C and G, you get G data as well which is not what 
we want. Now, scale this to millions of documents, each document may be a 
member of half a dozen categories, any given query could return thousands of 
categories which are all meaningless except for the few we want.

The reality is products are in C. It is meaningless what parent category they 
have, and thus what level. So, what is a good way to tackle this using Solr?

So, what I WAS asking is - is there a way to filter the facets. This is one 
example of around a dozen use cases we have for the technique. It is likely 
that one technique will resolve all of those. The technique I described that we 
were using works very fast. However, we have added the new requirement of 
getting the children of the children counts. For that, that's many more queries 
to obtain just to show one screen of data. Not sure it will scale.

The only thing I have come up with is for each product, index ALL taxonomies to 
get to it, so, perhaps a product in E might index:

E
D > E
C > D > E
B > C > D > E
A > B > C > D > E
Z > C > D > E
H > E
G > H > E
F > G > H > E
C > F > G > H > E
Z > C > F > G > H > E
Y > G > H > E

By doing that, one could use prefix since we index any possible starting point 
so, prefix "C >" would in fact count all products at any level below C, which 
means it won't work since I wanted product in each level. Probably, I'd have to 
index more data than this, yuck. The problem of course is the volume of data 
for each product, and, that the data can easily change drastically with tree 
changes, which happen all the time. The indexing time will grow quite a bit. 
Still, trying to figure out a good structure for this that would enable the 
queries to be done with Solr.

Any other thoughts? Hopefully, this more fully explains the requirement.

Re: Faceting a multi valued field

Reply via email to