I'm not sure about the scale you're aiming for, but you probably want to
do both sharding and replication.  There's no central server which would
be the bottleneck. The guidelines should probably be something like:
1. Split your index to enough shards so it can keep up with the update
rate.
2. Have enough replicates of each shard master to keep up with the rate
of queries.
3. Have enough aggregators in front of the shard replicates so the
aggregation doesn't become a bottleneck.
4. Make sure you have good load balancing across your system.

Attached is a diagram of the setup we have.  You might want to look into
SolrCloud as well.

Ephraim Ofir


-----Original Message-----
From: Jens Mueller [mailto:supidupi...@googlemail.com] 
Sent: Tuesday, April 05, 2011 4:25 AM
To: solr-user@lucene.apache.org
Subject: Very very large scale Solr Deployment = how to do (Expert
Question)?

Hello Experts,



I am a Solr newbie but read quite a lot of docs. I still do not
understand
what would be the best way to setup very large scale deployments:



Goal (threoretical):

 A.) Index-Size: 1 Petabyte (1 Document is about 5 KB in Size)

 B) Queries: 100000 Queries/ per Second

 C) Updates: 100000 Updates / per Second




Solr offers:

1.)    Replication => Scales Well for B)  BUT  A) and C) are not
satisfied


2.)    Sharding => Scales well for A) BUT B) and C) are not satisfied
(=> As
I understand the Sharding approach all goes through a central server,
that
dispatches the updates and assembles the quries retrieved from the
different
shards. But this central server has also some capacity limits...)




What is the right approach to handle such large deployments? I would be
thankfull for just a rough sketch of the concepts so I can
experiment/search
further...


Maybe I am missing something very trivial as I think some of the "Solr
Users/Use Cases" on the homepage are that kind of large deployments. How
are
they implemented?



Thanky very much!!!

Jens

Reply via email to