Erick, It could have more than 4M distinct values. The purpose of this facet is to display the most frequent, say top 500, urls to users.
Sascha, Thanks for the info. I will look into this thread thing. Mingfeng On Mon, Nov 4, 2013 at 4:47 AM, Erick Erickson <erickerick...@gmail.com>wrote: > How many unique URLs do you have in your 9M > docs? If your 9M hits have 4M distinct URLs, then > this is not very valuable to the user. > > Sascha: > Was that speedup on a single field or were you faceting over > multiple fields? Because as I remember that code spins off > threads on a per-field basis, and if I'm mis-remembering I need > to look again! > > Best, > Erick > > > On Sat, Nov 2, 2013 at 5:07 AM, Sascha SZOTT <sz...@gmx.de> wrote: > > > Hi Ming, > > > > which Solr version are you using? In case you use one of the latest > > versions (4.5 or above) try the new parameter facet.threads with a > > reasonable value (4 to 8 gave me a massive performance speedup when > > working with large facets, i.e. nTerms >> 10^7). > > > > -Sascha > > > > > > Mingfeng Yang wrote: > > > I have an index with 170M documents, and two of the fields for each > > > doc is "source" and "url". And I want to know the top 500 most > > > frequent urls from Video source. > > > > > > So I did a facet with > > > "fq=source:Video&facet=true&facet.field=url&facet.limit=500", and > > > the matching documents are about 9 millions. > > > > > > The solr cluster is hosted on two ec2 instances each with 4 cpu, and > > > 32G memory. 16G is allocated tfor java heap. 4 master shards on one > > > machine, and 4 replica on another machine. Connected together via > > > zookeeper. > > > > > > Whenever I did the query above, the response is just taking too long > > > and the client will get timed out. Sometimes, when the end user is > > > impatient, so he/she may wait for a few second for the results, and > > > then kill the connection, and then issue the same query again and > > > again. Then the server will have to deal with multiple such heavy > > > queries simultaneously and being so busy that we got "no server > > > hosting shard" error, probably due to lost communication between solr > > > node and zookeeper. > > > > > > Is there any way to deal with such problem? > > > > > > Thanks, Ming > > > > > >