Re: Using group.ngroups during query search

Zheng Lin Edwin Yeo Sat, 12 Mar 2016 06:57:56 -0800

Hi Toke,

Thanks for your reply.


>StatsComponent has countDistinct which is accurate, but has a warning
>that is might be very heavy.

This means that by using this method, the speed will probably be slow as
well?

By the way, I found that all the slow queries occurs in searches which
are returning more than 300,000 ngroups, and the speed is in proportion to
the number of records. At 300,000 ngroups. I am still able to get a return
within 3 seconds, but for a search with more than 6,000,000 ngroups, it
will take almost 2 minutes for the search to return.

Regards,
Edwin


On 11 March 2016 at 17:19, Toke Eskildsen <t...@statsbiblioteket.dk> wrote:

> On Fri, 2016-03-11 at 12:11 +0800, Zheng Lin Edwin Yeo wrote:
> > I would like to check, will using the results grouping with group.ngroups
> > (which will include the number of groups that have matched the query) in
> > the search affects the performance of the Solr?
>
> Yes. Calculating ngroups is done by collecting all the groups in a
> shard. They are kept as BytesRefs, which means a lot of lookups plus
> memory overhead proportional to the ngroups count.
>
> > I required the value of the number of groups that have matched the query.
> > Besides this, is there other way which I can retrieve that value?
>
> JSON Facets has numBuckets which is fast, but might not be accurate:
> https://issues.apache.org/jira/browse/SOLR-8741
>
> StatsComponent has countDistinct which is accurate, but has a warning
> that is might be very heavy.
>
>
> If you want accurate counts and if you are using SolrCloud, each shard
> must return the full list of values, independent of whether you use
> grouping, faceting or stats. Depending on cardinality this can be very
> heavy.
>
> > I have more than 10 million documents, with an index size of more than
> > 500GB, and I'm using Solr 5.4.0.
>
> - Toke Eskildsen, State and University Library, Denmark
>
>
>

Re: Using group.ngroups during query search

Reply via email to