In our case, we have less than 20 distinct groups, and a typical search result 
will return about 10 of those groups (usually 3 documents per group). We use 
default sorting by score.  There are 12 million docs spread across 3 shards.  
We set group.facet=false.  The wkcluster field is a string field indexed using 
DocValues. Each document will have a value for the wkcluster field. Sample 
query:

?q=*%3A*&rows=100&wt=xml&indent=true&group=true&group.field=wkcluster&group.limit=3&hl=false&facet=false&group.facet=false

This query returned 18 groups and took about 1.7 seconds even after executing 
it a few times.

The main drag we see is that there are 2 internal queries (on each shard) 
generated when we have group=true. They are essentially the same except for 
additional group.topgroups params in the 2nd query.  These queries seem to be 
done serially, so it almost doubles the latency.  I'm not sure if it's 
something we're doing (or not doing) in the query, or this is just the way it 
is.

I don't think we can use the aforementioned block-join feature here, as it 
would be difficult for us to build document blocks based on the group (and 
there's been requirements to group on different fields).  Unfortunately the 
grouping feature has been extremely popular in the production applications 
running on our search platform (we’re migrating from Fast, where grouping 
performance was quite good).

We do have other performance issues (currently we are investigating an issue 
with a scale function) - we are hoping we can resolve those to such a point 
where the double query for grouping isn't so noticable.

-----Original Message-----
From: Joel Bernstein [mailto:joels...@gmail.com] 
Sent: Friday, January 30, 2015 6:42 PM
To: solr-user@lucene.apache.org
Subject: Re: Does DocValues improve Grouping performance ?

A few questions so we can better understand the scale of grouping you're trying 
to accomplish:

How many distinct groups do you typically have in a search result?

How many distinct groups are there in the field you are grouping on?

How many results are you trying to group in a query?

Joel Bernstein
Search Engineer at Heliosearch

On Fri, Jan 30, 2015 at 4:10 PM, Cario, Elaine < 
elaine.ca...@wolterskluwer.com> wrote:

> Hi Shamik,
>
> We use DocValues for grouping, and although I have nothing to compare 
> it to (we started with DocValues), we are also seeing similar poor 
> results as
> you: easily 60% overhead compared to non-group queries.  Looking 
> around for some solution, no quick fix is presenting itself unfortunately.
> CollapsingQParserPlugin also is too limited for our needs.
>
> -----Original Message-----
> From: Shamik Bandopadhyay [mailto:sham...@gmail.com]
> Sent: Thursday, January 15, 2015 6:02 PM
> To: solr-user@lucene.apache.org
> Subject: Does DocValues improve Grouping performance ?
>
> Hi,
>
>    Does use of DocValues provide any performance improvement for Grouping ?
> I' looked into the blog which mentions improving Grouping performance 
> through DocValues.
>
> https://lucidworks.com/blog/fun-with-docvalues-in-solr-4-2/
>
> Right now, Group by queries (which I can't sadly avoid) has become a 
> huge bottleneck. It has an overhead of 60-70% compared to the same 
> query san group by. Unfortunately, I'm not able to be 
> CollapsingQParserPlugin as it doesn't have a support similar to "group.facet" 
> feature.
>
> My understanding on DocValues is that it's intended for faceting and 
> sorting. Just wondering if anyone have tried DocValues for Grouping 
> and saw any improvements ?
>
> -Thanks,
> Shamik
>

Reply via email to