Perfect. Thank you very much. Andy
--- On Fri, 4/8/11, Pascal Coupet <pcou...@gmail.com> wrote: > From: Pascal Coupet <pcou...@gmail.com> > Subject: Re: Very very large scale Solr Deployment = how to do (Expert > Question)? > To: solr-user@lucene.apache.org > Date: Friday, April 8, 2011, 10:20 AM > I dit put a pdf version here: > https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B02DHBZQYYT_MmRkZTY0YjQtODJmZS00Mzg0LWJiNTEtOWJjNzViNmNjZjdh&hl=en&authkey=CL2Fq_QG > > Zoom it to get a better view. > > Pascal > > 2011/4/8 Andy <angelf...@yahoo.com> > > > Could anyone please post a version of the document in > pdf or openoffice > > format? I'm on Linux so there's no way for me to use > MS Word. > > > > Thanks. > > > > > > --- On Fri, 4/8/11, Albert Vila <a...@imente.com> > wrote: > > > > > From: Albert Vila <a...@imente.com> > > > Subject: Re: Very very large scale Solr > Deployment = how to do (Expert > > Question)? > > > To: solr-user@lucene.apache.org > > > Date: Friday, April 8, 2011, 9:25 AM > > > Yes, It won't work if you are using > > > OpenOffice. However it works fine > > > with Microsoft Word. > > > > > > Hope it helps. > > > > > > Albert > > > > > > On 8 April 2011 14:55, Andy <angelf...@yahoo.com> > > > wrote: > > > > I can't view the document either -- it > showed up > > > empty. > > > > > > > > Has anyone succeeded in viewing it? > > > > > > > > Andy > > > > > > > > --- On Fri, 4/8/11, Albert Vila <a...@imente.com> > > > wrote: > > > > > > > >> From: Albert Vila <a...@imente.com> > > > >> Subject: Re: Very very large scale Solr > Deployment > > > = how to do (Expert Question)? > > > >> To: solr-user@lucene.apache.org > > > >> Date: Friday, April 8, 2011, 3:43 AM > > > >> Ephraim, I still can't view the > > > >> document. > > > >> > > > >> Don't know if I'm doing something wrong, > but I > > > downloaded > > > >> it and It > > > >> appears to be empty. > > > >> > > > >> Albert > > > >> > > > >> On 7 April 2011 09:32, Ephraim Ofir > <ephra...@icq.com> > > > >> wrote: > > > >> > You can't view it online, but you > should be > > > able to > > > >> download it from: > > > >> > > > https://docs.google.com/leaf?id=0BwOEbnJ7oeOrNmU5ZThjODUtYzM5MS00YjRlLWI > > > >> > > > > > 2OTktZTEzNDk1YmVmOWU4&hl=en&authkey=COGel4gP > > > >> > > > > >> > Enjoy, > > > >> > Ephraim Ofir > > > >> > > > > >> > > > > >> > -----Original Message----- > > > >> > From: Jens Mueller [mailto:supidupi...@googlemail.com] > > > >> > Sent: Thursday, April 07, 2011 8:30 > AM > > > >> > To: solr-user@lucene.apache.org > > > >> > Subject: Re: Very very large scale > Solr > > > Deployment = > > > >> how to do (Expert > > > >> > Question)? > > > >> > > > > >> > Hello Ephraim, hello Lance, hello > Walter, > > > >> > > > > >> > thanks for your replies: > > > >> > > > > >> > Ephraim, thanks very much for the > further > > > detailed > > > >> explanation. I will > > > >> > try > > > >> > to setup a demo system in the next > few days > > > and use > > > >> your advice. > > > >> > LoadBalancers are an important > aspect of your > > > design. > > > >> Can you recommend > > > >> > one > > > >> > LB specificallly? (I would be > using > > > haproxy.1wt.eu) . > > > >> I think the Idea > > > >> > with > > > >> > uploading your document is very > good. > > > However > > > >> Google-Docs seemed not be > > > >> > be > > > >> > working (at least for me with the > docx > > > format?), but > > > >> maybe you can > > > >> > simply > > > >> > output the document as PDF and then > I think > > > Google > > > >> Docs is working, so > > > >> > all > > > >> > the others can also have a look at > your > > > concept. The > > > >> best approach would > > > >> > be > > > >> > if you could upload your advice > directly > > > somewhere to > > > >> the solr wiki as > > > >> > it is > > > >> > really helpful.I found some other > documents > > > meanwhile, > > > >> but yours is much > > > >> > clearer and more complete, with the > LBs and > > > the > > > >> Aggregators ( > > > >> > http://lucene-eurocon.org/slides/Solr-In-The-Cloud_Mark-Miller.pdf) > > > >> > > > > >> > Lance, thanks I will have a look at > what > > > linkedin is > > > >> doing. > > > >> > > > > >> > Walter, thanks for the advice: Well > you are > > > right, > > > >> mentioning google. My > > > >> > question was also to understand how > such > > > large systems > > > >> like > > > >> > google/facebook > > > >> > are actually working. So my numbers > are just > > > >> theoretical and made up. My > > > >> > system will be smaller, but I > would be very > > > happy to > > > >> understand how > > > >> > such > > > >> > large systems are build and I think > the > > > approach > > > >> Ephraim showd should be > > > >> > working quite well at large scale. > If you > > > know a good > > > >> documents (besides > > > >> > the > > > >> > bigtable research paper that I > already know) > > > that > > > >> technically describes > > > >> > how > > > >> > google is working in detail that > would be of > > > great > > > >> interest. You seem to > > > >> > be > > > >> > working for a company that handles > large > > > datasets. > > > >> Does google use this > > > >> > approach, sharing the index into N > writers, > > > and the > > > >> procuded index is > > > >> > then > > > >> > replicated to N "read only > searchers"? > > > >> > > > > >> > thank you all. > > > >> > best regards > > > >> > jens > > > >> > > > > >> > > > > >> > > > > >> > 2011/4/7 Walter Underwood <wun...@wunderwood.org> > > > >> > > > > >> >> The bigger answer is that you > cannot get > > > to this > > > >> size by just > > > >> > configuring > > > >> >> Solr. You may have to invent a > lot of > > > stuff. Like > > > >> all of Google. > > > >> >> > > > >> >> Where did you get these > numbers? The > > > proposed > > > >> query rate is twice as > > > >> > big as > > > >> >> Google (Feb 2010 estimate, 34K > qps). > > > >> >> > > > >> >> I work at MarkLogic, and we > scale to > > > 100's of > > > >> terabytes, with fast > > > >> > update > > > >> >> and query rates. If you want a > real > > > system that > > > >> handles that, you > > > >> > might want > > > >> >> to look at our product. > > > >> >> > > > >> >> wunder > > > >> >> > > > >> >> On Apr 6, 2011, at 8:06 PM, > Lance Norskog > > > wrote: > > > >> >> > > > >> >> > I would not use > replication. > > > LinkedIn > > > >> consumer search is a flat > > > >> > system > > > >> >> > where one process indexes > new > > > entries and > > > >> does queries > > > >> > simultaneously. > > > >> >> > It's a custom Lucene app > called > > > Zoie. Their > > > >> stuff is on Github.. > > > >> >> > > > > >> >> > I would get documents to > indexers > > > via a > > > >> multicast IP-based queueing > > > >> >> > system. This scales very > well and > > > there's a > > > >> lot of hardware support. > > > >> >> > > > > >> >> > The problem with > distributed search > > > is that > > > >> it is a) inherently > > > >> > slower > > > >> >> > and b) has inherently more > and > > > longer jitter. > > > >> The "airplane wing" > > > >> >> > distribution of query > times becomes > > > longer > > > >> and flatter. > > > >> >> > > > > >> >> > This is going to have to > be a > > > "federated" > > > >> system, where the > > > >> > front-end > > > >> >> > app aggregates results > rather than > > > Solr. > > > >> >> > > > > >> >> > On Mon, Apr 4, 2011 at > 6:25 PM, Jens > > > Mueller > > > >> > <supidupi...@googlemail.com> > > > >> >> wrote: > > > >> >> >> Hello Experts, > > > >> >> >> > > > >> >> >> > > > >> >> >> > > > >> >> >> I am a Solr newbie but > read > > > quite a lot > > > >> of docs. I still do not > > > >> >> understand > > > >> >> >> what would be the best > way to > > > setup very > > > >> large scale deployments: > > > >> >> >> > > > >> >> >> > > > >> >> >> > > > >> >> >> Goal (threoretical): > > > >> >> >> > > > >> >> >> A.) Index-Size: > 1 Petabyte (1 > > > Document > > > >> is about 5 KB in Size) > > > >> >> >> > > > >> >> >> B) Queries: > 100000 Queries/ > > > per Second > > > >> >> >> > > > >> >> >> C) Updates: > 100000 Updates / > > > per > > > >> Second > > > >> >> >> > > > >> >> >> > > > >> >> >> > > > >> >> >> > > > >> >> >> Solr offers: > > > >> >> >> > > > >> >> >> 1.) > Replication => > > > Scales Well > > > >> for B) BUT A) and C) are > not > > > >> >> satisfied > > > >> >> >> > > > >> >> >> > > > >> >> >> 2.) > Sharding => Scales > > > well for > > > >> A) BUT B) and C) are not > > > >> > satisfied > > > >> >> (=> As > > > >> >> >> I understand the > Sharding > > > approach all > > > >> goes through a central > > > >> > server, > > > >> >> that > > > >> >> >> dispatches the updates > and > > > assembles the > > > >> quries retrieved from the > > > >> >> different > > > >> >> >> shards. But this > central server > > > has also > > > >> some capacity limits...) > > > >> >> >> > > > >> >> >> > > > >> >> >> > > > >> >> >> > > > >> >> >> What is the right > approach to > > > handle such > > > >> large deployments? I > > > >> > would be > > > >> >> >> thankfull for just a > rough > > > sketch of the > > > >> concepts so I can > > > >> >> experiment/search > > > >> >> >> further... > > > >> >> >> > > > >> >> >> > > > >> >> >> Maybe I am missing > something > > > very trivial > > > >> as I think some of the > > > >> > "Solr > > > >> >> >> Users/Use Cases" on > the homepage > > > are that > > > >> kind of large > > > >> > deployments. How > > > >> >> are > > > >> >> >> they implemented? > > > >> >> >> > > > >> >> >> > > > >> >> >> > > > >> >> >> Thanky very much!!! > > > >> >> >> > > > >> >> >> Jens > > > >> >> >> > > > >> >> > > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> > > > > >> > > > >> > > > >> > > > >> -- > > > >> Albert Vila Puig > > > >> <a...@imente.com> > > > >> iMente.com <http://www.imente.com> > > > >> > > > > > > > > > > > > > > > > -- > > > Albert Vila Puig > > > <a...@imente.com> > > > iMente.com <http://www.imente.com> > > > > > >