I'm not sure about the scale you're aiming for, but you probably want to do both sharding and replication. There's no central server which would be the bottleneck. The guidelines should probably be something like: 1. Split your index to enough shards so it can keep up with the update rate. 2. Have enough replicates of each shard master to keep up with the rate of queries. 3. Have enough aggregators in front of the shard replicates so the aggregation doesn't become a bottleneck. 4. Make sure you have good load balancing across your system.
Attached is a diagram of the setup we have. You might want to look into SolrCloud as well. Ephraim Ofir -----Original Message----- From: Jens Mueller [mailto:supidupi...@googlemail.com] Sent: Tuesday, April 05, 2011 4:25 AM To: solr-user@lucene.apache.org Subject: Very very large scale Solr Deployment = how to do (Expert Question)? Hello Experts, I am a Solr newbie but read quite a lot of docs. I still do not understand what would be the best way to setup very large scale deployments: Goal (threoretical): A.) Index-Size: 1 Petabyte (1 Document is about 5 KB in Size) B) Queries: 100000 Queries/ per Second C) Updates: 100000 Updates / per Second Solr offers: 1.) Replication => Scales Well for B) BUT A) and C) are not satisfied 2.) Sharding => Scales well for A) BUT B) and C) are not satisfied (=> As I understand the Sharding approach all goes through a central server, that dispatches the updates and assembles the quries retrieved from the different shards. But this central server has also some capacity limits...) What is the right approach to handle such large deployments? I would be thankfull for just a rough sketch of the concepts so I can experiment/search further... Maybe I am missing something very trivial as I think some of the "Solr Users/Use Cases" on the homepage are that kind of large deployments. How are they implemented? Thanky very much!!! Jens