Sounds like what you're asking for is tree faceting. A basic implementation is available in SOLR-792, but one that could also take facet.queries, numeric or date range buckets, to tree on would be a nice improvement.

Still, the underlying implementation will simply enumerate all the possible values (SOLR-792 has some short-circuiting when the top-level has zero, of course). A client-side application could do this with multiple requests to Solr.

Subsearch - sure, just make more requests to Solr, rearranging the parameters.

I'd still say that in general for this type of need that it'll "generally" be less arbitrary and locking some things in during indexing will be the pragmatic way to go for most cases.

        Erik



On Jan 29, 2010, at 9:28 AM, Peter S wrote:


Well, it wouldn't be 'every' combination - more of 'any' combination at query-time.

The 'arbitrary' part of the requirement is because it's not practical to predict every combination a user might ask for, although generally users would tend to search for similar/the same query combinations (but perhaps with different date ranges, for example).

If 'predicted aggregate fields' were calculated at index-time on, say, 10 fields (the schema in question actually as 73 fields), that's 3,628,801 new fields. A large percentage of these would likely never be used (which ones would depend on the user, environment etc.).


Perhaps a more 'typical' use case than my network-based example would be a product search web page, where you want to show the number of products that are made by a manufacturer and within a certain price range (e.g. Sony [$600-$800] (15) ). To obtain the (15) facet count value, you would have to correlate the number of Sony products (say, (861)), and the products that fall into the [600 TO 800] price range (say, (1226) ). The (15) would be the intersection of the Sony hits and the price range hits by 'manufacturer:Sony'. Am I right that filter queries could only do this for document hits if you know the field values ahead of time (e.g. fq=manufacturer:Sony&fq=price:[600 TO 800])? The facets could then be derived by simply counting the numFound for each result set.



If there were subsearch support in Solr (i.e. take the output of a query and use it as input into another) that included facets [perhaps there is such support?], it might be used to achieve this effect.


A custom query parser plugin could work, maybe? I suppose it would need to gather up all the separate facets and correlate them according to the input query (e.g. host and user, or manufacturer and price range). Such a mechanism would be crying out for caching, but perhaps it could leverage the existing field and query caches.


Peter




From: erik.hatc...@gmail.com
To: solr-user@lucene.apache.org
Subject: Re: Aggregated facet value counts?
Date: Fri, 29 Jan 2010 07:39:44 -0500

Creating values for every possible combination is what you're asking
Solr to do at query-time, and as far as I know there isn't really a
way to accomplish that like you're asking. Is the need really to be
arbitrary here?

Erik

On Jan 29, 2010, at 7:25 AM, Peter S wrote:


Hi Erik,



Thanks for your reply. That's an interesting idea doing it at index-
time, and a good idea for known field combinations.

The only thing is........

How to handle arbitrary field combinations? - i.e. to allow the
caller to specify any combination of fields at query-time?

So, yes, the data is available at index-time, but the combination
isn't (short of creating fields for every possible combination).



Peter



From: erik.hatc...@gmail.com
To: solr-user@lucene.apache.org
Subject: Re: Aggregated facet value counts?
Date: Fri, 29 Jan 2010 06:30:27 -0500

When faced with this type of situation where the data is entirely
available at index-time, simply create an aggregated field that glues
the two pieces together, and facet on that.

Erik

On Jan 29, 2010, at 6:16 AM, Peter S wrote:


Hi,



I was wondering if anyone had come across this use case, and if this
type of faceting is possible:



The requirement is to build a query such that an aggregated facet
count of common (and'ed) field values form the basis of each
returned facet count.



For example:

Let's say I have a number of documents in an index with, among
others, the fields 'host' and 'user':



Doc1 host:machine_1 user:user_1

Doc2 host:machine_1 user:user_2

Doc3 host:machine_1 user:user_1

Doc3 host:machine_1 user:user_1



Doc4 host:machine_2 user:user_1

Doc5 host:machine_2 user:user_1

Doc6 host:machine_2 user:user_4



Doc7 host:machine_1 user:user_4



Is it possible to get facets back that would give the count of
documents that have common host AND user values (preferably ordered
- i.e. host then user for this example, so as not to create a
factorial explosion)? Note that the caller wouldn't know what
machine and user values exist, only the field names.

I've tried using facet queries in various ways to see if they could work for this, but I believe facet queries work on a different plane
than this requirement (narrowing the term count, a.o.t.
aggregating).



For the example above, the desired result would be:



machine_1/user_1 (3)

machine_1/user_2 (1)

machine_1/user_4 (1)



machine_2/user_1 (2)

machine_2/user_4 (1)



Has anyone had a need for this type of faceting and found a way to
achieve it?



Many thanks,

Peter





_________________________________________________________________
We want to hear all your funny, exciting and crazy Hotmail stories.
Tell us now
http://clk.atdmt.com/UKM/go/195013117/direct/01/


_________________________________________________________________
Tell us your greatest, weirdest and funniest Hotmail stories
http://clk.atdmt.com/UKM/go/195013117/direct/01/


                                        
_________________________________________________________________
We want to hear all your funny, exciting and crazy Hotmail stories. Tell us now
http://clk.atdmt.com/UKM/go/195013117/direct/01/

Reply via email to