I would not use replication. LinkedIn consumer search is a flat system where one process indexes new entries and does queries simultaneously. It's a custom Lucene app called Zoie. Their stuff is on Github..
I would get documents to indexers via a multicast IP-based queueing system. This scales very well and there's a lot of hardware support. The problem with distributed search is that it is a) inherently slower and b) has inherently more and longer jitter. The "airplane wing" distribution of query times becomes longer and flatter. This is going to have to be a "federated" system, where the front-end app aggregates results rather than Solr. On Mon, Apr 4, 2011 at 6:25 PM, Jens Mueller <supidupi...@googlemail.com> wrote: > Hello Experts, > > > > I am a Solr newbie but read quite a lot of docs. I still do not understand > what would be the best way to setup very large scale deployments: > > > > Goal (threoretical): > > A.) Index-Size: 1 Petabyte (1 Document is about 5 KB in Size) > > B) Queries: 100000 Queries/ per Second > > C) Updates: 100000 Updates / per Second > > > > > Solr offers: > > 1.) Replication => Scales Well for B) BUT A) and C) are not satisfied > > > 2.) Sharding => Scales well for A) BUT B) and C) are not satisfied (=> As > I understand the Sharding approach all goes through a central server, that > dispatches the updates and assembles the quries retrieved from the different > shards. But this central server has also some capacity limits...) > > > > > What is the right approach to handle such large deployments? I would be > thankfull for just a rough sketch of the concepts so I can experiment/search > further… > > > Maybe I am missing something very trivial as I think some of the “Solr > Users/Use Cases” on the homepage are that kind of large deployments. How are > they implemented? > > > > Thanky very much!!! > > Jens > -- Lance Norskog goks...@gmail.com