Re: Aggregated facet value counts?

Erik Hatcher Fri, 29 Jan 2010 09:20:41 -0800

Sounds like what you're asking for is tree faceting. A basicimplementation is available in SOLR-792, but one that could also takefacet.queries, numeric or date range buckets, to tree on would be anice improvement.

Still, the underlying implementation will simply enumerate all thepossible values (SOLR-792 has some short-circuiting when the top-levelhas zero, of course). A client-side application could do this withmultiple requests to Solr.

Subsearch - sure, just make more requests to Solr, rearranging theparameters.

I'd still say that in general for this type of need that it'll"generally" be less arbitrary and locking some things in duringindexing will be the pragmatic way to go for most cases.


        Erik



On Jan 29, 2010, at 9:28 AM, Peter S wrote:

Well, it wouldn't be 'every' combination - more of 'any' combinationat query-time.
The 'arbitrary' part of the requirement is because it's notpractical to predict every combination a user might ask for,although generally users would tend to search for similar/the samequery combinations (but perhaps with different date ranges, forexample).
If 'predicted aggregate fields' were calculated at index-time on,say, 10 fields (the schema in question actually as 73 fields),that's 3,628,801 new fields. A large percentage of these wouldlikely never be used (which ones would depend on the user,environment etc.).
Perhaps a more 'typical' use case than my network-based examplewould be a product search web page, where you want to show thenumber of products that are made by a manufacturer and within acertain price range (e.g. Sony [$600-$800] (15) ). To obtain the(15) facet count value, you would have to correlate the number ofSony products (say, (861)), and the products that fall into the [600TO 800] price range (say, (1226) ). The (15) would be theintersection of the Sony hits and the price range hits by'manufacturer:Sony'. Am I right that filter queries could only dothis for document hits if you know the field values ahead of time(e.g. fq=manufacturer:Sony&fq=price:[600 TO 800])? The facets couldthen be derived by simply counting the numFound for each result set.
If there were subsearch support in Solr (i.e. take the output of aquery and use it as input into another) that included facets[perhaps there is such support?], it might be used to achieve thiseffect.
A custom query parser plugin could work, maybe? I suppose it wouldneed to gather up all the separate facets and correlate themaccording to the input query (e.g. host and user, or manufacturerand price range). Such a mechanism would be crying out for caching,but perhaps it could leverage the existing field and query caches.
Peter
From: erik.hatc...@gmail.com
To: solr-user@lucene.apache.org
Subject: Re: Aggregated facet value counts?
Date: Fri, 29 Jan 2010 07:39:44 -0500

Creating values for every possible combination is what you're asking
Solr to do at query-time, and as far as I know there isn't really a
way to accomplish that like you're asking. Is the need really to be
arbitrary here?

Erik

On Jan 29, 2010, at 7:25 AM, Peter S wrote:
Hi Erik,



Thanks for your reply. That's an interesting idea doing it at index-
time, and a good idea for known field combinations.

The only thing is........

How to handle arbitrary field combinations? - i.e. to allow the
caller to specify any combination of fields at query-time?

So, yes, the data is available at index-time, but the combination
isn't (short of creating fields for every possible combination).



Peter
From: erik.hatc...@gmail.com
To: solr-user@lucene.apache.org
Subject: Re: Aggregated facet value counts?
Date: Fri, 29 Jan 2010 06:30:27 -0500

When faced with this type of situation where the data is entirely
available at index-time, simply create an aggregated field thatglues
the two pieces together, and facet on that.

Erik

On Jan 29, 2010, at 6:16 AM, Peter S wrote:
Hi,
I was wondering if anyone had come across this use case, and ifthis
type of faceting is possible:



The requirement is to build a query such that an aggregated facet
count of common (and'ed) field values form the basis of each
returned facet count.



For example:

Let's say I have a number of documents in an index with, among
others, the fields 'host' and 'user':



Doc1 host:machine_1 user:user_1

Doc2 host:machine_1 user:user_2

Doc3 host:machine_1 user:user_1

Doc3 host:machine_1 user:user_1



Doc4 host:machine_2 user:user_1

Doc5 host:machine_2 user:user_1

Doc6 host:machine_2 user:user_4



Doc7 host:machine_1 user:user_4



Is it possible to get facets back that would give the count of
documents that have common host AND user values (preferablyordered
- i.e. host then user for this example, so as not to create a
factorial explosion)? Note that the caller wouldn't know what
machine and user values exist, only the field names.
I've tried using facet queries in various ways to see if theycouldwork for this, but I believe facet queries work on a differentplane
than this requirement (narrowing the term count, a.o.t.
aggregating).



For the example above, the desired result would be:



machine_1/user_1 (3)

machine_1/user_2 (1)

machine_1/user_4 (1)



machine_2/user_1 (2)

machine_2/user_4 (1)



Has anyone had a need for this type of faceting and found a way to
achieve it?



Many thanks,

Peter





_________________________________________________________________
We want to hear all your funny, exciting and crazy Hotmailstories.
Tell us now
http://clk.atdmt.com/UKM/go/195013117/direct/01/
_________________________________________________________________
Tell us your greatest, weirdest and funniest Hotmail stories
http://clk.atdmt.com/UKM/go/195013117/direct/01/
                                        
_________________________________________________________________
We want to hear all your funny, exciting and crazy Hotmail stories.Tell us now
http://clk.atdmt.com/UKM/go/195013117/direct/01/

Re: Aggregated facet value counts?

Reply via email to