From: Chris Hostetter <hossman_luc...@fucit.org> >To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>; Steve Fatula ><compconsult...@yahoo.com> >Sent: Monday, November 7, 2011 5:42 PM >Subject: Re: Faceting a multi valued field > > >how are you modeling the tree nature of your cateory taxonomy when you >index the terms? if you index each category id as the breadcrumb of all >it's ancestor categories and the "depth" of the category in the tree, you >can use facet.prefix to only see the children of a specified category. > > Someone always wants to understand the full use case. :-) I do understand why, but, sometimes said use case is extremely complex with dozens and dozens of search requirements. I was trying to limit the explanation and was hoping someone could just answer the question as is. However, I will improve upon my question... To provide slightly more detail, the examples given on the Solr hierarchical wiki page are extremely simplistic. Consider the following taxonomy:
A > B > C > D > E Z > C > D > E Z > C > F > G > H > E Y > G > H > E Now, I want to get a count of the products in the children of C, AND, each of their children (so, 2 levels, i.e. D, D > E, F, F > G). Note, unlike the wiki examples, C exists at multiple levels, possibly, lots of multiples. Each product (indexed document) can be a member of multiple categories. So, if we search for products of category C, no matter I believe how it is indexed, if a given product is a member of C and G, you get G data as well which is not what we want. Now, scale this to millions of documents, each document may be a member of half a dozen categories, any given query could return thousands of categories which are all meaningless except for the few we want. The reality is products are in C. It is meaningless what parent category they have, and thus what level. So, what is a good way to tackle this using Solr? So, what I WAS asking is - is there a way to filter the facets. This is one example of around a dozen use cases we have for the technique. It is likely that one technique will resolve all of those. The technique I described that we were using works very fast. However, we have added the new requirement of getting the children of the children counts. For that, that's many more queries to obtain just to show one screen of data. Not sure it will scale. The only thing I have come up with is for each product, index ALL taxonomies to get to it, so, perhaps a product in E might index: E D > E C > D > E B > C > D > E A > B > C > D > E Z > C > D > E H > E G > H > E F > G > H > E C > F > G > H > E Z > C > F > G > H > E Y > G > H > E By doing that, one could use prefix since we index any possible starting point so, prefix "C >" would in fact count all products at any level below C, which means it won't work since I wanted product in each level. Probably, I'd have to index more data than this, yuck. The problem of course is the volume of data for each product, and, that the data can easily change drastically with tree changes, which happen all the time. The indexing time will grow quite a bit. Still, trying to figure out a good structure for this that would enable the queries to be done with Solr. Any other thoughts? Hopefully, this more fully explains the requirement.