Tree faceting - that sounds very interesting indeed. I'll have a look into that and see how it fits, as well as any improvements for adding facet queries, cross-field aggregation, date range etc. There could be some very nice use-cases for such functionality. Just wondering how this would work with distributed shards/multi-core...
Many Thanks! Peter > From: erik.hatc...@gmail.com > To: solr-user@lucene.apache.org > Subject: Re: Aggregated facet value counts? > Date: Fri, 29 Jan 2010 12:20:07 -0500 > > Sounds like what you're asking for is tree faceting. A basic > implementation is available in SOLR-792, but one that could also take > facet.queries, numeric or date range buckets, to tree on would be a > nice improvement. > > Still, the underlying implementation will simply enumerate all the > possible values (SOLR-792 has some short-circuiting when the top-level > has zero, of course). A client-side application could do this with > multiple requests to Solr. > > Subsearch - sure, just make more requests to Solr, rearranging the > parameters. > > I'd still say that in general for this type of need that it'll > "generally" be less arbitrary and locking some things in during > indexing will be the pragmatic way to go for most cases. > > Erik > > > > On Jan 29, 2010, at 9:28 AM, Peter S wrote: > > > > > Well, it wouldn't be 'every' combination - more of 'any' combination > > at query-time. > > > > The 'arbitrary' part of the requirement is because it's not > > practical to predict every combination a user might ask for, > > although generally users would tend to search for similar/the same > > query combinations (but perhaps with different date ranges, for > > example). > > > > If 'predicted aggregate fields' were calculated at index-time on, > > say, 10 fields (the schema in question actually as 73 fields), > > that's 3,628,801 new fields. A large percentage of these would > > likely never be used (which ones would depend on the user, > > environment etc.). > > > > > > Perhaps a more 'typical' use case than my network-based example > > would be a product search web page, where you want to show the > > number of products that are made by a manufacturer and within a > > certain price range (e.g. Sony [$600-$800] (15) ). To obtain the > > (15) facet count value, you would have to correlate the number of > > Sony products (say, (861)), and the products that fall into the [600 > > TO 800] price range (say, (1226) ). The (15) would be the > > intersection of the Sony hits and the price range hits by > > 'manufacturer:Sony'. Am I right that filter queries could only do > > this for document hits if you know the field values ahead of time > > (e.g. fq=manufacturer:Sony&fq=price:[600 TO 800])? The facets could > > then be derived by simply counting the numFound for each result set. > > > > > > > > If there were subsearch support in Solr (i.e. take the output of a > > query and use it as input into another) that included facets > > [perhaps there is such support?], it might be used to achieve this > > effect. > > > > > > A custom query parser plugin could work, maybe? I suppose it would > > need to gather up all the separate facets and correlate them > > according to the input query (e.g. host and user, or manufacturer > > and price range). Such a mechanism would be crying out for caching, > > but perhaps it could leverage the existing field and query caches. > > > > > > Peter > > > > > > > > > >> From: erik.hatc...@gmail.com > >> To: solr-user@lucene.apache.org > >> Subject: Re: Aggregated facet value counts? > >> Date: Fri, 29 Jan 2010 07:39:44 -0500 > >> > >> Creating values for every possible combination is what you're asking > >> Solr to do at query-time, and as far as I know there isn't really a > >> way to accomplish that like you're asking. Is the need really to be > >> arbitrary here? > >> > >> Erik > >> > >> On Jan 29, 2010, at 7:25 AM, Peter S wrote: > >> > >>> > >>> Hi Erik, > >>> > >>> > >>> > >>> Thanks for your reply. That's an interesting idea doing it at index- > >>> time, and a good idea for known field combinations. > >>> > >>> The only thing is........ > >>> > >>> How to handle arbitrary field combinations? - i.e. to allow the > >>> caller to specify any combination of fields at query-time? > >>> > >>> So, yes, the data is available at index-time, but the combination > >>> isn't (short of creating fields for every possible combination). > >>> > >>> > >>> > >>> Peter > >>> > >>> > >>> > >>>> From: erik.hatc...@gmail.com > >>>> To: solr-user@lucene.apache.org > >>>> Subject: Re: Aggregated facet value counts? > >>>> Date: Fri, 29 Jan 2010 06:30:27 -0500 > >>>> > >>>> When faced with this type of situation where the data is entirely > >>>> available at index-time, simply create an aggregated field that > >>>> glues > >>>> the two pieces together, and facet on that. > >>>> > >>>> Erik > >>>> > >>>> On Jan 29, 2010, at 6:16 AM, Peter S wrote: > >>>> > >>>>> > >>>>> Hi, > >>>>> > >>>>> > >>>>> > >>>>> I was wondering if anyone had come across this use case, and if > >>>>> this > >>>>> type of faceting is possible: > >>>>> > >>>>> > >>>>> > >>>>> The requirement is to build a query such that an aggregated facet > >>>>> count of common (and'ed) field values form the basis of each > >>>>> returned facet count. > >>>>> > >>>>> > >>>>> > >>>>> For example: > >>>>> > >>>>> Let's say I have a number of documents in an index with, among > >>>>> others, the fields 'host' and 'user': > >>>>> > >>>>> > >>>>> > >>>>> Doc1 host:machine_1 user:user_1 > >>>>> > >>>>> Doc2 host:machine_1 user:user_2 > >>>>> > >>>>> Doc3 host:machine_1 user:user_1 > >>>>> > >>>>> Doc3 host:machine_1 user:user_1 > >>>>> > >>>>> > >>>>> > >>>>> Doc4 host:machine_2 user:user_1 > >>>>> > >>>>> Doc5 host:machine_2 user:user_1 > >>>>> > >>>>> Doc6 host:machine_2 user:user_4 > >>>>> > >>>>> > >>>>> > >>>>> Doc7 host:machine_1 user:user_4 > >>>>> > >>>>> > >>>>> > >>>>> Is it possible to get facets back that would give the count of > >>>>> documents that have common host AND user values (preferably > >>>>> ordered > >>>>> - i.e. host then user for this example, so as not to create a > >>>>> factorial explosion)? Note that the caller wouldn't know what > >>>>> machine and user values exist, only the field names. > >>>>> > >>>>> I've tried using facet queries in various ways to see if they > >>>>> could > >>>>> work for this, but I believe facet queries work on a different > >>>>> plane > >>>>> than this requirement (narrowing the term count, a.o.t. > >>>>> aggregating). > >>>>> > >>>>> > >>>>> > >>>>> For the example above, the desired result would be: > >>>>> > >>>>> > >>>>> > >>>>> machine_1/user_1 (3) > >>>>> > >>>>> machine_1/user_2 (1) > >>>>> > >>>>> machine_1/user_4 (1) > >>>>> > >>>>> > >>>>> > >>>>> machine_2/user_1 (2) > >>>>> > >>>>> machine_2/user_4 (1) > >>>>> > >>>>> > >>>>> > >>>>> Has anyone had a need for this type of faceting and found a way to > >>>>> achieve it? > >>>>> > >>>>> > >>>>> > >>>>> Many thanks, > >>>>> > >>>>> Peter > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> _________________________________________________________________ > >>>>> We want to hear all your funny, exciting and crazy Hotmail > >>>>> stories. > >>>>> Tell us now > >>>>> http://clk.atdmt.com/UKM/go/195013117/direct/01/ > >>>> > >>> > >>> _________________________________________________________________ > >>> Tell us your greatest, weirdest and funniest Hotmail stories > >>> http://clk.atdmt.com/UKM/go/195013117/direct/01/ > >> > > > > > > _________________________________________________________________ > > We want to hear all your funny, exciting and crazy Hotmail stories. > > Tell us now > > http://clk.atdmt.com/UKM/go/195013117/direct/01/ > _________________________________________________________________ Got a cool Hotmail story? Tell us now http://clk.atdmt.com/UKM/go/195013117/direct/01/