I can't view the document either -- it showed up empty. Has anyone succeeded in viewing it?
Andy --- On Fri, 4/8/11, Albert Vila <a...@imente.com> wrote: > From: Albert Vila <a...@imente.com> > Subject: Re: Very very large scale Solr Deployment = how to do (Expert > Question)? > To: solr-user@lucene.apache.org > Date: Friday, April 8, 2011, 3:43 AM > Ephraim, I still can't view the > document. > > Don't know if I'm doing something wrong, but I downloaded > it and It > appears to be empty. > > Albert > > On 7 April 2011 09:32, Ephraim Ofir <ephra...@icq.com> > wrote: > > You can't view it online, but you should be able to > download it from: > > https://docs.google.com/leaf?id=0BwOEbnJ7oeOrNmU5ZThjODUtYzM5MS00YjRlLWI > > 2OTktZTEzNDk1YmVmOWU4&hl=en&authkey=COGel4gP > > > > Enjoy, > > Ephraim Ofir > > > > > > -----Original Message----- > > From: Jens Mueller [mailto:supidupi...@googlemail.com] > > Sent: Thursday, April 07, 2011 8:30 AM > > To: solr-user@lucene.apache.org > > Subject: Re: Very very large scale Solr Deployment = > how to do (Expert > > Question)? > > > > Hello Ephraim, hello Lance, hello Walter, > > > > thanks for your replies: > > > > Ephraim, thanks very much for the further detailed > explanation. I will > > try > > to setup a demo system in the next few days and use > your advice. > > LoadBalancers are an important aspect of your design. > Can you recommend > > one > > LB specificallly? (I would be using haproxy.1wt.eu) . > I think the Idea > > with > > uploading your document is very good. However > Google-Docs seemed not be > > be > > working (at least for me with the docx format?), but > maybe you can > > simply > > output the document as PDF and then I think Google > Docs is working, so > > all > > the others can also have a look at your concept. The > best approach would > > be > > if you could upload your advice directly somewhere to > the solr wiki as > > it is > > really helpful.I found some other documents meanwhile, > but yours is much > > clearer and more complete, with the LBs and the > Aggregators ( > > http://lucene-eurocon.org/slides/Solr-In-The-Cloud_Mark-Miller.pdf) > > > > Lance, thanks I will have a look at what linkedin is > doing. > > > > Walter, thanks for the advice: Well you are right, > mentioning google. My > > question was also to understand how such large systems > like > > google/facebook > > are actually working. So my numbers are just > theoretical and made up. My > > system will be smaller, but I would be very happy to > understand how > > such > > large systems are build and I think the approach > Ephraim showd should be > > working quite well at large scale. If you know a good > documents (besides > > the > > bigtable research paper that I already know) that > technically describes > > how > > google is working in detail that would be of great > interest. You seem to > > be > > working for a company that handles large datasets. > Does google use this > > approach, sharing the index into N writers, and the > procuded index is > > then > > replicated to N "read only searchers"? > > > > thank you all. > > best regards > > jens > > > > > > > > 2011/4/7 Walter Underwood <wun...@wunderwood.org> > > > >> The bigger answer is that you cannot get to this > size by just > > configuring > >> Solr. You may have to invent a lot of stuff. Like > all of Google. > >> > >> Where did you get these numbers? The proposed > query rate is twice as > > big as > >> Google (Feb 2010 estimate, 34K qps). > >> > >> I work at MarkLogic, and we scale to 100's of > terabytes, with fast > > update > >> and query rates. If you want a real system that > handles that, you > > might want > >> to look at our product. > >> > >> wunder > >> > >> On Apr 6, 2011, at 8:06 PM, Lance Norskog wrote: > >> > >> > I would not use replication. LinkedIn > consumer search is a flat > > system > >> > where one process indexes new entries and > does queries > > simultaneously. > >> > It's a custom Lucene app called Zoie. Their > stuff is on Github.. > >> > > >> > I would get documents to indexers via a > multicast IP-based queueing > >> > system. This scales very well and there's a > lot of hardware support. > >> > > >> > The problem with distributed search is that > it is a) inherently > > slower > >> > and b) has inherently more and longer jitter. > The "airplane wing" > >> > distribution of query times becomes longer > and flatter. > >> > > >> > This is going to have to be a "federated" > system, where the > > front-end > >> > app aggregates results rather than Solr. > >> > > >> > On Mon, Apr 4, 2011 at 6:25 PM, Jens Mueller > > <supidupi...@googlemail.com> > >> wrote: > >> >> Hello Experts, > >> >> > >> >> > >> >> > >> >> I am a Solr newbie but read quite a lot > of docs. I still do not > >> understand > >> >> what would be the best way to setup very > large scale deployments: > >> >> > >> >> > >> >> > >> >> Goal (threoretical): > >> >> > >> >> A.) Index-Size: 1 Petabyte (1 Document > is about 5 KB in Size) > >> >> > >> >> B) Queries: 100000 Queries/ per Second > >> >> > >> >> C) Updates: 100000 Updates / per > Second > >> >> > >> >> > >> >> > >> >> > >> >> Solr offers: > >> >> > >> >> 1.) Replication => Scales Well > for B) BUT A) and C) are not > >> satisfied > >> >> > >> >> > >> >> 2.) Sharding => Scales well for > A) BUT B) and C) are not > > satisfied > >> (=> As > >> >> I understand the Sharding approach all > goes through a central > > server, > >> that > >> >> dispatches the updates and assembles the > quries retrieved from the > >> different > >> >> shards. But this central server has also > some capacity limits...) > >> >> > >> >> > >> >> > >> >> > >> >> What is the right approach to handle such > large deployments? I > > would be > >> >> thankfull for just a rough sketch of the > concepts so I can > >> experiment/search > >> >> further... > >> >> > >> >> > >> >> Maybe I am missing something very trivial > as I think some of the > > "Solr > >> >> Users/Use Cases" on the homepage are that > kind of large > > deployments. How > >> are > >> >> they implemented? > >> >> > >> >> > >> >> > >> >> Thanky very much!!! > >> >> > >> >> Jens > >> >> > >> > > >> > >> > >> > >> > >> > > > > > > -- > Albert Vila Puig > <a...@imente.com> > iMente.com <http://www.imente.com> >