Ephraim, I still can't view the document. Don't know if I'm doing something wrong, but I downloaded it and It appears to be empty.
Albert On 7 April 2011 09:32, Ephraim Ofir <ephra...@icq.com> wrote: > You can't view it online, but you should be able to download it from: > https://docs.google.com/leaf?id=0BwOEbnJ7oeOrNmU5ZThjODUtYzM5MS00YjRlLWI > 2OTktZTEzNDk1YmVmOWU4&hl=en&authkey=COGel4gP > > Enjoy, > Ephraim Ofir > > > -----Original Message----- > From: Jens Mueller [mailto:supidupi...@googlemail.com] > Sent: Thursday, April 07, 2011 8:30 AM > To: solr-user@lucene.apache.org > Subject: Re: Very very large scale Solr Deployment = how to do (Expert > Question)? > > Hello Ephraim, hello Lance, hello Walter, > > thanks for your replies: > > Ephraim, thanks very much for the further detailed explanation. I will > try > to setup a demo system in the next few days and use your advice. > LoadBalancers are an important aspect of your design. Can you recommend > one > LB specificallly? (I would be using haproxy.1wt.eu) . I think the Idea > with > uploading your document is very good. However Google-Docs seemed not be > be > working (at least for me with the docx format?), but maybe you can > simply > output the document as PDF and then I think Google Docs is working, so > all > the others can also have a look at your concept. The best approach would > be > if you could upload your advice directly somewhere to the solr wiki as > it is > really helpful.I found some other documents meanwhile, but yours is much > clearer and more complete, with the LBs and the Aggregators ( > http://lucene-eurocon.org/slides/Solr-In-The-Cloud_Mark-Miller.pdf) > > Lance, thanks I will have a look at what linkedin is doing. > > Walter, thanks for the advice: Well you are right, mentioning google. My > question was also to understand how such large systems like > google/facebook > are actually working. So my numbers are just theoretical and made up. My > system will be smaller, but I would be very happy to understand how > such > large systems are build and I think the approach Ephraim showd should be > working quite well at large scale. If you know a good documents (besides > the > bigtable research paper that I already know) that technically describes > how > google is working in detail that would be of great interest. You seem to > be > working for a company that handles large datasets. Does google use this > approach, sharing the index into N writers, and the procuded index is > then > replicated to N "read only searchers"? > > thank you all. > best regards > jens > > > > 2011/4/7 Walter Underwood <wun...@wunderwood.org> > >> The bigger answer is that you cannot get to this size by just > configuring >> Solr. You may have to invent a lot of stuff. Like all of Google. >> >> Where did you get these numbers? The proposed query rate is twice as > big as >> Google (Feb 2010 estimate, 34K qps). >> >> I work at MarkLogic, and we scale to 100's of terabytes, with fast > update >> and query rates. If you want a real system that handles that, you > might want >> to look at our product. >> >> wunder >> >> On Apr 6, 2011, at 8:06 PM, Lance Norskog wrote: >> >> > I would not use replication. LinkedIn consumer search is a flat > system >> > where one process indexes new entries and does queries > simultaneously. >> > It's a custom Lucene app called Zoie. Their stuff is on Github.. >> > >> > I would get documents to indexers via a multicast IP-based queueing >> > system. This scales very well and there's a lot of hardware support. >> > >> > The problem with distributed search is that it is a) inherently > slower >> > and b) has inherently more and longer jitter. The "airplane wing" >> > distribution of query times becomes longer and flatter. >> > >> > This is going to have to be a "federated" system, where the > front-end >> > app aggregates results rather than Solr. >> > >> > On Mon, Apr 4, 2011 at 6:25 PM, Jens Mueller > <supidupi...@googlemail.com> >> wrote: >> >> Hello Experts, >> >> >> >> >> >> >> >> I am a Solr newbie but read quite a lot of docs. I still do not >> understand >> >> what would be the best way to setup very large scale deployments: >> >> >> >> >> >> >> >> Goal (threoretical): >> >> >> >> A.) Index-Size: 1 Petabyte (1 Document is about 5 KB in Size) >> >> >> >> B) Queries: 100000 Queries/ per Second >> >> >> >> C) Updates: 100000 Updates / per Second >> >> >> >> >> >> >> >> >> >> Solr offers: >> >> >> >> 1.) Replication => Scales Well for B) BUT A) and C) are not >> satisfied >> >> >> >> >> >> 2.) Sharding => Scales well for A) BUT B) and C) are not > satisfied >> (=> As >> >> I understand the Sharding approach all goes through a central > server, >> that >> >> dispatches the updates and assembles the quries retrieved from the >> different >> >> shards. But this central server has also some capacity limits...) >> >> >> >> >> >> >> >> >> >> What is the right approach to handle such large deployments? I > would be >> >> thankfull for just a rough sketch of the concepts so I can >> experiment/search >> >> further... >> >> >> >> >> >> Maybe I am missing something very trivial as I think some of the > "Solr >> >> Users/Use Cases" on the homepage are that kind of large > deployments. How >> are >> >> they implemented? >> >> >> >> >> >> >> >> Thanky very much!!! >> >> >> >> Jens >> >> >> > >> >> >> >> >> > -- Albert Vila Puig <a...@imente.com> iMente.com <http://www.imente.com>