Sounds like what you're asking for is tree faceting. A basic
implementation is available in SOLR-792, but one that could also take
facet.queries, numeric or date range buckets, to tree on would be a
nice improvement.
Still, the underlying implementation will simply enumerate all the
possible values (SOLR-792 has some short-circuiting when the top-level
has zero, of course). A client-side application could do this with
multiple requests to Solr.
Subsearch - sure, just make more requests to Solr, rearranging the
parameters.
I'd still say that in general for this type of need that it'll
"generally" be less arbitrary and locking some things in during
indexing will be the pragmatic way to go for most cases.
Erik
On Jan 29, 2010, at 9:28 AM, Peter S wrote:
Well, it wouldn't be 'every' combination - more of 'any' combination
at query-time.
The 'arbitrary' part of the requirement is because it's not
practical to predict every combination a user might ask for,
although generally users would tend to search for similar/the same
query combinations (but perhaps with different date ranges, for
example).
If 'predicted aggregate fields' were calculated at index-time on,
say, 10 fields (the schema in question actually as 73 fields),
that's 3,628,801 new fields. A large percentage of these would
likely never be used (which ones would depend on the user,
environment etc.).
Perhaps a more 'typical' use case than my network-based example
would be a product search web page, where you want to show the
number of products that are made by a manufacturer and within a
certain price range (e.g. Sony [$600-$800] (15) ). To obtain the
(15) facet count value, you would have to correlate the number of
Sony products (say, (861)), and the products that fall into the [600
TO 800] price range (say, (1226) ). The (15) would be the
intersection of the Sony hits and the price range hits by
'manufacturer:Sony'. Am I right that filter queries could only do
this for document hits if you know the field values ahead of time
(e.g. fq=manufacturer:Sony&fq=price:[600 TO 800])? The facets could
then be derived by simply counting the numFound for each result set.
If there were subsearch support in Solr (i.e. take the output of a
query and use it as input into another) that included facets
[perhaps there is such support?], it might be used to achieve this
effect.
A custom query parser plugin could work, maybe? I suppose it would
need to gather up all the separate facets and correlate them
according to the input query (e.g. host and user, or manufacturer
and price range). Such a mechanism would be crying out for caching,
but perhaps it could leverage the existing field and query caches.
Peter
From: erik.hatc...@gmail.com
To: solr-user@lucene.apache.org
Subject: Re: Aggregated facet value counts?
Date: Fri, 29 Jan 2010 07:39:44 -0500
Creating values for every possible combination is what you're asking
Solr to do at query-time, and as far as I know there isn't really a
way to accomplish that like you're asking. Is the need really to be
arbitrary here?
Erik
On Jan 29, 2010, at 7:25 AM, Peter S wrote:
Hi Erik,
Thanks for your reply. That's an interesting idea doing it at index-
time, and a good idea for known field combinations.
The only thing is........
How to handle arbitrary field combinations? - i.e. to allow the
caller to specify any combination of fields at query-time?
So, yes, the data is available at index-time, but the combination
isn't (short of creating fields for every possible combination).
Peter
From: erik.hatc...@gmail.com
To: solr-user@lucene.apache.org
Subject: Re: Aggregated facet value counts?
Date: Fri, 29 Jan 2010 06:30:27 -0500
When faced with this type of situation where the data is entirely
available at index-time, simply create an aggregated field that
glues
the two pieces together, and facet on that.
Erik
On Jan 29, 2010, at 6:16 AM, Peter S wrote:
Hi,
I was wondering if anyone had come across this use case, and if
this
type of faceting is possible:
The requirement is to build a query such that an aggregated facet
count of common (and'ed) field values form the basis of each
returned facet count.
For example:
Let's say I have a number of documents in an index with, among
others, the fields 'host' and 'user':
Doc1 host:machine_1 user:user_1
Doc2 host:machine_1 user:user_2
Doc3 host:machine_1 user:user_1
Doc3 host:machine_1 user:user_1
Doc4 host:machine_2 user:user_1
Doc5 host:machine_2 user:user_1
Doc6 host:machine_2 user:user_4
Doc7 host:machine_1 user:user_4
Is it possible to get facets back that would give the count of
documents that have common host AND user values (preferably
ordered
- i.e. host then user for this example, so as not to create a
factorial explosion)? Note that the caller wouldn't know what
machine and user values exist, only the field names.
I've tried using facet queries in various ways to see if they
could
work for this, but I believe facet queries work on a different
plane
than this requirement (narrowing the term count, a.o.t.
aggregating).
For the example above, the desired result would be:
machine_1/user_1 (3)
machine_1/user_2 (1)
machine_1/user_4 (1)
machine_2/user_1 (2)
machine_2/user_4 (1)
Has anyone had a need for this type of faceting and found a way to
achieve it?
Many thanks,
Peter
_________________________________________________________________
We want to hear all your funny, exciting and crazy Hotmail
stories.
Tell us now
http://clk.atdmt.com/UKM/go/195013117/direct/01/
_________________________________________________________________
Tell us your greatest, weirdest and funniest Hotmail stories
http://clk.atdmt.com/UKM/go/195013117/direct/01/
_________________________________________________________________
We want to hear all your funny, exciting and crazy Hotmail stories.
Tell us now
http://clk.atdmt.com/UKM/go/195013117/direct/01/