The bigger answer is that you cannot get to this size by just configuring Solr. You may have to invent a lot of stuff. Like all of Google.
Where did you get these numbers? The proposed query rate is twice as big as Google (Feb 2010 estimate, 34K qps). I work at MarkLogic, and we scale to 100's of terabytes, with fast update and query rates. If you want a real system that handles that, you might want to look at our product. wunder On Apr 6, 2011, at 8:06 PM, Lance Norskog wrote: > I would not use replication. LinkedIn consumer search is a flat system > where one process indexes new entries and does queries simultaneously. > It's a custom Lucene app called Zoie. Their stuff is on Github.. > > I would get documents to indexers via a multicast IP-based queueing > system. This scales very well and there's a lot of hardware support. > > The problem with distributed search is that it is a) inherently slower > and b) has inherently more and longer jitter. The "airplane wing" > distribution of query times becomes longer and flatter. > > This is going to have to be a "federated" system, where the front-end > app aggregates results rather than Solr. > > On Mon, Apr 4, 2011 at 6:25 PM, Jens Mueller <supidupi...@googlemail.com> > wrote: >> Hello Experts, >> >> >> >> I am a Solr newbie but read quite a lot of docs. I still do not understand >> what would be the best way to setup very large scale deployments: >> >> >> >> Goal (threoretical): >> >> A.) Index-Size: 1 Petabyte (1 Document is about 5 KB in Size) >> >> B) Queries: 100000 Queries/ per Second >> >> C) Updates: 100000 Updates / per Second >> >> >> >> >> Solr offers: >> >> 1.) Replication => Scales Well for B) BUT A) and C) are not satisfied >> >> >> 2.) Sharding => Scales well for A) BUT B) and C) are not satisfied (=> As >> I understand the Sharding approach all goes through a central server, that >> dispatches the updates and assembles the quries retrieved from the different >> shards. But this central server has also some capacity limits...) >> >> >> >> >> What is the right approach to handle such large deployments? I would be >> thankfull for just a rough sketch of the concepts so I can experiment/search >> further… >> >> >> Maybe I am missing something very trivial as I think some of the “Solr >> Users/Use Cases” on the homepage are that kind of large deployments. How are >> they implemented? >> >> >> >> Thanky very much!!! >> >> Jens >> >