Erick,

It could have more than 4M distinct values.  The purpose of this facet is
to display the most frequent, say top 500, urls to users.

Sascha,

Thanks for the info. I will look into this thread thing.

Mingfeng


On Mon, Nov 4, 2013 at 4:47 AM, Erick Erickson <erickerick...@gmail.com>wrote:

> How many unique URLs do you have in your 9M
> docs? If your 9M hits have 4M distinct URLs, then
> this is not very valuable to the user.
>
> Sascha:
> Was that speedup on a single field or were you faceting over
> multiple fields? Because as I remember that code spins off
> threads on a per-field basis, and if I'm mis-remembering I need
> to look again!
>
> Best,
> Erick
>
>
> On Sat, Nov 2, 2013 at 5:07 AM, Sascha SZOTT <sz...@gmx.de> wrote:
>
> > Hi Ming,
> >
> > which Solr version are you using? In case you use one of the latest
> > versions (4.5 or above) try the new parameter facet.threads with a
> > reasonable value (4 to 8 gave me a massive performance speedup when
> > working with large facets, i.e. nTerms >> 10^7).
> >
> > -Sascha
> >
> >
> > Mingfeng Yang wrote:
> > > I have an index with 170M documents, and two of the fields for each
> > > doc is "source" and "url".  And I want to know the top 500 most
> > > frequent urls from Video source.
> > >
> > > So I did a facet with
> > > "fq=source:Video&facet=true&facet.field=url&facet.limit=500", and
> > > the matching documents are about 9 millions.
> > >
> > > The solr cluster is hosted on two ec2 instances each with 4 cpu, and
> > > 32G memory. 16G is allocated tfor java heap.  4 master shards on one
> > > machine, and 4 replica on another machine. Connected together via
> > > zookeeper.
> > >
> > > Whenever I did the query above, the response is just taking too long
> > > and the client will get timed out. Sometimes,  when the end user is
> > > impatient, so he/she may wait for a few second for the results, and
> > > then kill the connection, and then issue the same query again and
> > > again.  Then the server will have to deal with multiple such heavy
> > > queries simultaneously and being so busy that we got "no server
> > > hosting shard" error, probably due to lost communication between solr
> > > node and zookeeper.
> > >
> > > Is there any way to deal with such problem?
> > >
> > > Thanks, Ming
> > >
> >
>

Reply via email to