RE: Aggregated facet value counts?

Peter S Fri, 29 Jan 2010 15:14:31 -0800

Tree faceting - that sounds very interesting indeed. I'll have a look into that 
and see how it fits, as well as any improvements for adding facet queries, 
cross-field aggregation, date range etc. There could be some very nice 
use-cases for such functionality. Just wondering how this would work with 
distributed shards/multi-core...



Many Thanks! 

Peter

 

 
> From: erik.hatc...@gmail.com
> To: solr-user@lucene.apache.org
> Subject: Re: Aggregated facet value counts?
> Date: Fri, 29 Jan 2010 12:20:07 -0500
> 
> Sounds like what you're asking for is tree faceting. A basic 
> implementation is available in SOLR-792, but one that could also take 
> facet.queries, numeric or date range buckets, to tree on would be a 
> nice improvement.
> 
> Still, the underlying implementation will simply enumerate all the 
> possible values (SOLR-792 has some short-circuiting when the top-level 
> has zero, of course). A client-side application could do this with 
> multiple requests to Solr.
> 
> Subsearch - sure, just make more requests to Solr, rearranging the 
> parameters.
> 
> I'd still say that in general for this type of need that it'll 
> "generally" be less arbitrary and locking some things in during 
> indexing will be the pragmatic way to go for most cases.
> 
> Erik
> 
> 
> 
> On Jan 29, 2010, at 9:28 AM, Peter S wrote:
> 
> >
> > Well, it wouldn't be 'every' combination - more of 'any' combination 
> > at query-time.
> >
> > The 'arbitrary' part of the requirement is because it's not 
> > practical to predict every combination a user might ask for, 
> > although generally users would tend to search for similar/the same 
> > query combinations (but perhaps with different date ranges, for 
> > example).
> >
> > If 'predicted aggregate fields' were calculated at index-time on, 
> > say, 10 fields (the schema in question actually as 73 fields), 
> > that's 3,628,801 new fields. A large percentage of these would 
> > likely never be used (which ones would depend on the user, 
> > environment etc.).
> >
> >
> > Perhaps a more 'typical' use case than my network-based example 
> > would be a product search web page, where you want to show the 
> > number of products that are made by a manufacturer and within a 
> > certain price range (e.g. Sony [$600-$800] (15) ). To obtain the 
> > (15) facet count value, you would have to correlate the number of 
> > Sony products (say, (861)), and the products that fall into the [600 
> > TO 800] price range (say, (1226) ). The (15) would be the 
> > intersection of the Sony hits and the price range hits by 
> > 'manufacturer:Sony'. Am I right that filter queries could only do 
> > this for document hits if you know the field values ahead of time 
> > (e.g. fq=manufacturer:Sony&fq=price:[600 TO 800])? The facets could 
> > then be derived by simply counting the numFound for each result set.
> >
> >
> >
> > If there were subsearch support in Solr (i.e. take the output of a 
> > query and use it as input into another) that included facets 
> > [perhaps there is such support?], it might be used to achieve this 
> > effect.
> >
> >
> > A custom query parser plugin could work, maybe? I suppose it would 
> > need to gather up all the separate facets and correlate them 
> > according to the input query (e.g. host and user, or manufacturer 
> > and price range). Such a mechanism would be crying out for caching, 
> > but perhaps it could leverage the existing field and query caches.
> >
> >
> > Peter
> >
> >
> >
> >
> >> From: erik.hatc...@gmail.com
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Aggregated facet value counts?
> >> Date: Fri, 29 Jan 2010 07:39:44 -0500
> >>
> >> Creating values for every possible combination is what you're asking
> >> Solr to do at query-time, and as far as I know there isn't really a
> >> way to accomplish that like you're asking. Is the need really to be
> >> arbitrary here?
> >>
> >> Erik
> >>
> >> On Jan 29, 2010, at 7:25 AM, Peter S wrote:
> >>
> >>>
> >>> Hi Erik,
> >>>
> >>>
> >>>
> >>> Thanks for your reply. That's an interesting idea doing it at index-
> >>> time, and a good idea for known field combinations.
> >>>
> >>> The only thing is........
> >>>
> >>> How to handle arbitrary field combinations? - i.e. to allow the
> >>> caller to specify any combination of fields at query-time?
> >>>
> >>> So, yes, the data is available at index-time, but the combination
> >>> isn't (short of creating fields for every possible combination).
> >>>
> >>>
> >>>
> >>> Peter
> >>>
> >>>
> >>>
> >>>> From: erik.hatc...@gmail.com
> >>>> To: solr-user@lucene.apache.org
> >>>> Subject: Re: Aggregated facet value counts?
> >>>> Date: Fri, 29 Jan 2010 06:30:27 -0500
> >>>>
> >>>> When faced with this type of situation where the data is entirely
> >>>> available at index-time, simply create an aggregated field that 
> >>>> glues
> >>>> the two pieces together, and facet on that.
> >>>>
> >>>> Erik
> >>>>
> >>>> On Jan 29, 2010, at 6:16 AM, Peter S wrote:
> >>>>
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>>
> >>>>>
> >>>>> I was wondering if anyone had come across this use case, and if 
> >>>>> this
> >>>>> type of faceting is possible:
> >>>>>
> >>>>>
> >>>>>
> >>>>> The requirement is to build a query such that an aggregated facet
> >>>>> count of common (and'ed) field values form the basis of each
> >>>>> returned facet count.
> >>>>>
> >>>>>
> >>>>>
> >>>>> For example:
> >>>>>
> >>>>> Let's say I have a number of documents in an index with, among
> >>>>> others, the fields 'host' and 'user':
> >>>>>
> >>>>>
> >>>>>
> >>>>> Doc1 host:machine_1 user:user_1
> >>>>>
> >>>>> Doc2 host:machine_1 user:user_2
> >>>>>
> >>>>> Doc3 host:machine_1 user:user_1
> >>>>>
> >>>>> Doc3 host:machine_1 user:user_1
> >>>>>
> >>>>>
> >>>>>
> >>>>> Doc4 host:machine_2 user:user_1
> >>>>>
> >>>>> Doc5 host:machine_2 user:user_1
> >>>>>
> >>>>> Doc6 host:machine_2 user:user_4
> >>>>>
> >>>>>
> >>>>>
> >>>>> Doc7 host:machine_1 user:user_4
> >>>>>
> >>>>>
> >>>>>
> >>>>> Is it possible to get facets back that would give the count of
> >>>>> documents that have common host AND user values (preferably 
> >>>>> ordered
> >>>>> - i.e. host then user for this example, so as not to create a
> >>>>> factorial explosion)? Note that the caller wouldn't know what
> >>>>> machine and user values exist, only the field names.
> >>>>>
> >>>>> I've tried using facet queries in various ways to see if they 
> >>>>> could
> >>>>> work for this, but I believe facet queries work on a different 
> >>>>> plane
> >>>>> than this requirement (narrowing the term count, a.o.t.
> >>>>> aggregating).
> >>>>>
> >>>>>
> >>>>>
> >>>>> For the example above, the desired result would be:
> >>>>>
> >>>>>
> >>>>>
> >>>>> machine_1/user_1 (3)
> >>>>>
> >>>>> machine_1/user_2 (1)
> >>>>>
> >>>>> machine_1/user_4 (1)
> >>>>>
> >>>>>
> >>>>>
> >>>>> machine_2/user_1 (2)
> >>>>>
> >>>>> machine_2/user_4 (1)
> >>>>>
> >>>>>
> >>>>>
> >>>>> Has anyone had a need for this type of faceting and found a way to
> >>>>> achieve it?
> >>>>>
> >>>>>
> >>>>>
> >>>>> Many thanks,
> >>>>>
> >>>>> Peter
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> _________________________________________________________________
> >>>>> We want to hear all your funny, exciting and crazy Hotmail 
> >>>>> stories.
> >>>>> Tell us now
> >>>>> http://clk.atdmt.com/UKM/go/195013117/direct/01/
> >>>>
> >>>
> >>> _________________________________________________________________
> >>> Tell us your greatest, weirdest and funniest Hotmail stories
> >>> http://clk.atdmt.com/UKM/go/195013117/direct/01/
> >>
> >
> > 
> > _________________________________________________________________
> > We want to hear all your funny, exciting and crazy Hotmail stories. 
> > Tell us now
> > http://clk.atdmt.com/UKM/go/195013117/direct/01/
> 
                                          
_________________________________________________________________
Got a cool Hotmail story? Tell us now
http://clk.atdmt.com/UKM/go/195013117/direct/01/

RE: Aggregated facet value counts?

Reply via email to