Hi Toke,

I don't have any blog, but here is a high level idea:

I have 31 machine cluster with 3 shards on each (93 shards). Each machine
has 250~GB ram and 3TB SSD for search index (there is another drive for OS
and stuff). One solr process runs for each shard with 48G heap. So we have
3 large files on the SSD.

That is just one cluster, we have 5 such clusters which we can bring live
or offline (for testing or maintenance etc.) Usually 3 are active at any
time, taking 1/3 of user traffic each.
We don't rely on replication between these clusters. Our out-of-solr
processes send writes to all the replicas in parallel. We don't use
solrCloud although it was available in solr.4.5 (which we are using).


Thanks
Nawab


On Wed, May 24, 2017 at 3:01 PM, Toke Eskildsen <t...@kb.dk> wrote:

> Nawab Zada Asad Iqbal <khi...@gmail.com> wrote:
> > @Toke, I stumbled upon your page last week but it seems that your huge
> > index doesn't receive a lot of query traffic.
>
> It switches between two kinds of usage:
>
> Everyday use is very low traffic by researchers using it interactively:
> 1-2 simultaneous queries, with faceting ranging from somewhat heavy to very
> heavy. Our setup is optimized towards this scenario and latency starts to
> go up pretty quickly if the number of simultaneous request rises.
>
> Now and then some cultural probes are being performed, where the index is
> being hammered continuously by multiple threads. Here it is our experience
> that max throughput for extremely simple queries (existence checks for
> social security numbers) is around 50 queries/second.
>
> > Mine is around 60TB and receives around 120 queries per second; ~90
> shards on 30 machines.
>
> Sounds interesting. Do you have a more detailed write-up somewhere?
>
> - Toke
>

Reply via email to