I dit put a pdf version here: https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B02DHBZQYYT_MmRkZTY0YjQtODJmZS00Mzg0LWJiNTEtOWJjNzViNmNjZjdh&hl=en&authkey=CL2Fq_QG
Zoom it to get a better view. Pascal 2011/4/8 Andy <angelf...@yahoo.com> > Could anyone please post a version of the document in pdf or openoffice > format? I'm on Linux so there's no way for me to use MS Word. > > Thanks. > > > --- On Fri, 4/8/11, Albert Vila <a...@imente.com> wrote: > > > From: Albert Vila <a...@imente.com> > > Subject: Re: Very very large scale Solr Deployment = how to do (Expert > Question)? > > To: solr-user@lucene.apache.org > > Date: Friday, April 8, 2011, 9:25 AM > > Yes, It won't work if you are using > > OpenOffice. However it works fine > > with Microsoft Word. > > > > Hope it helps. > > > > Albert > > > > On 8 April 2011 14:55, Andy <angelf...@yahoo.com> > > wrote: > > > I can't view the document either -- it showed up > > empty. > > > > > > Has anyone succeeded in viewing it? > > > > > > Andy > > > > > > --- On Fri, 4/8/11, Albert Vila <a...@imente.com> > > wrote: > > > > > >> From: Albert Vila <a...@imente.com> > > >> Subject: Re: Very very large scale Solr Deployment > > = how to do (Expert Question)? > > >> To: solr-user@lucene.apache.org > > >> Date: Friday, April 8, 2011, 3:43 AM > > >> Ephraim, I still can't view the > > >> document. > > >> > > >> Don't know if I'm doing something wrong, but I > > downloaded > > >> it and It > > >> appears to be empty. > > >> > > >> Albert > > >> > > >> On 7 April 2011 09:32, Ephraim Ofir <ephra...@icq.com> > > >> wrote: > > >> > You can't view it online, but you should be > > able to > > >> download it from: > > >> > > https://docs.google.com/leaf?id=0BwOEbnJ7oeOrNmU5ZThjODUtYzM5MS00YjRlLWI > > >> > > > 2OTktZTEzNDk1YmVmOWU4&hl=en&authkey=COGel4gP > > >> > > > >> > Enjoy, > > >> > Ephraim Ofir > > >> > > > >> > > > >> > -----Original Message----- > > >> > From: Jens Mueller [mailto:supidupi...@googlemail.com] > > >> > Sent: Thursday, April 07, 2011 8:30 AM > > >> > To: solr-user@lucene.apache.org > > >> > Subject: Re: Very very large scale Solr > > Deployment = > > >> how to do (Expert > > >> > Question)? > > >> > > > >> > Hello Ephraim, hello Lance, hello Walter, > > >> > > > >> > thanks for your replies: > > >> > > > >> > Ephraim, thanks very much for the further > > detailed > > >> explanation. I will > > >> > try > > >> > to setup a demo system in the next few days > > and use > > >> your advice. > > >> > LoadBalancers are an important aspect of your > > design. > > >> Can you recommend > > >> > one > > >> > LB specificallly? (I would be using > > haproxy.1wt.eu) . > > >> I think the Idea > > >> > with > > >> > uploading your document is very good. > > However > > >> Google-Docs seemed not be > > >> > be > > >> > working (at least for me with the docx > > format?), but > > >> maybe you can > > >> > simply > > >> > output the document as PDF and then I think > > Google > > >> Docs is working, so > > >> > all > > >> > the others can also have a look at your > > concept. The > > >> best approach would > > >> > be > > >> > if you could upload your advice directly > > somewhere to > > >> the solr wiki as > > >> > it is > > >> > really helpful.I found some other documents > > meanwhile, > > >> but yours is much > > >> > clearer and more complete, with the LBs and > > the > > >> Aggregators ( > > >> > http://lucene-eurocon.org/slides/Solr-In-The-Cloud_Mark-Miller.pdf) > > >> > > > >> > Lance, thanks I will have a look at what > > linkedin is > > >> doing. > > >> > > > >> > Walter, thanks for the advice: Well you are > > right, > > >> mentioning google. My > > >> > question was also to understand how such > > large systems > > >> like > > >> > google/facebook > > >> > are actually working. So my numbers are just > > >> theoretical and made up. My > > >> > system will be smaller, but I would be very > > happy to > > >> understand how > > >> > such > > >> > large systems are build and I think the > > approach > > >> Ephraim showd should be > > >> > working quite well at large scale. If you > > know a good > > >> documents (besides > > >> > the > > >> > bigtable research paper that I already know) > > that > > >> technically describes > > >> > how > > >> > google is working in detail that would be of > > great > > >> interest. You seem to > > >> > be > > >> > working for a company that handles large > > datasets. > > >> Does google use this > > >> > approach, sharing the index into N writers, > > and the > > >> procuded index is > > >> > then > > >> > replicated to N "read only searchers"? > > >> > > > >> > thank you all. > > >> > best regards > > >> > jens > > >> > > > >> > > > >> > > > >> > 2011/4/7 Walter Underwood <wun...@wunderwood.org> > > >> > > > >> >> The bigger answer is that you cannot get > > to this > > >> size by just > > >> > configuring > > >> >> Solr. You may have to invent a lot of > > stuff. Like > > >> all of Google. > > >> >> > > >> >> Where did you get these numbers? The > > proposed > > >> query rate is twice as > > >> > big as > > >> >> Google (Feb 2010 estimate, 34K qps). > > >> >> > > >> >> I work at MarkLogic, and we scale to > > 100's of > > >> terabytes, with fast > > >> > update > > >> >> and query rates. If you want a real > > system that > > >> handles that, you > > >> > might want > > >> >> to look at our product. > > >> >> > > >> >> wunder > > >> >> > > >> >> On Apr 6, 2011, at 8:06 PM, Lance Norskog > > wrote: > > >> >> > > >> >> > I would not use replication. > > LinkedIn > > >> consumer search is a flat > > >> > system > > >> >> > where one process indexes new > > entries and > > >> does queries > > >> > simultaneously. > > >> >> > It's a custom Lucene app called > > Zoie. Their > > >> stuff is on Github.. > > >> >> > > > >> >> > I would get documents to indexers > > via a > > >> multicast IP-based queueing > > >> >> > system. This scales very well and > > there's a > > >> lot of hardware support. > > >> >> > > > >> >> > The problem with distributed search > > is that > > >> it is a) inherently > > >> > slower > > >> >> > and b) has inherently more and > > longer jitter. > > >> The "airplane wing" > > >> >> > distribution of query times becomes > > longer > > >> and flatter. > > >> >> > > > >> >> > This is going to have to be a > > "federated" > > >> system, where the > > >> > front-end > > >> >> > app aggregates results rather than > > Solr. > > >> >> > > > >> >> > On Mon, Apr 4, 2011 at 6:25 PM, Jens > > Mueller > > >> > <supidupi...@googlemail.com> > > >> >> wrote: > > >> >> >> Hello Experts, > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> I am a Solr newbie but read > > quite a lot > > >> of docs. I still do not > > >> >> understand > > >> >> >> what would be the best way to > > setup very > > >> large scale deployments: > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> Goal (threoretical): > > >> >> >> > > >> >> >> A.) Index-Size: 1 Petabyte (1 > > Document > > >> is about 5 KB in Size) > > >> >> >> > > >> >> >> B) Queries: 100000 Queries/ > > per Second > > >> >> >> > > >> >> >> C) Updates: 100000 Updates / > > per > > >> Second > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> Solr offers: > > >> >> >> > > >> >> >> 1.) Replication => > > Scales Well > > >> for B) BUT A) and C) are not > > >> >> satisfied > > >> >> >> > > >> >> >> > > >> >> >> 2.) Sharding => Scales > > well for > > >> A) BUT B) and C) are not > > >> > satisfied > > >> >> (=> As > > >> >> >> I understand the Sharding > > approach all > > >> goes through a central > > >> > server, > > >> >> that > > >> >> >> dispatches the updates and > > assembles the > > >> quries retrieved from the > > >> >> different > > >> >> >> shards. But this central server > > has also > > >> some capacity limits...) > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> What is the right approach to > > handle such > > >> large deployments? I > > >> > would be > > >> >> >> thankfull for just a rough > > sketch of the > > >> concepts so I can > > >> >> experiment/search > > >> >> >> further... > > >> >> >> > > >> >> >> > > >> >> >> Maybe I am missing something > > very trivial > > >> as I think some of the > > >> > "Solr > > >> >> >> Users/Use Cases" on the homepage > > are that > > >> kind of large > > >> > deployments. How > > >> >> are > > >> >> >> they implemented? > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> Thanky very much!!! > > >> >> >> > > >> >> >> Jens > > >> >> >> > > >> >> > > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> > > > >> > > >> > > >> > > >> -- > > >> Albert Vila Puig > > >> <a...@imente.com> > > >> iMente.com <http://www.imente.com> > > >> > > > > > > > > > > > -- > > Albert Vila Puig > > <a...@imente.com> > > iMente.com <http://www.imente.com> > > >