Could anyone please post a version of the document in pdf or openoffice format? I'm on Linux so there's no way for me to use MS Word.
Thanks. --- On Fri, 4/8/11, Albert Vila <a...@imente.com> wrote: > From: Albert Vila <a...@imente.com> > Subject: Re: Very very large scale Solr Deployment = how to do (Expert > Question)? > To: solr-user@lucene.apache.org > Date: Friday, April 8, 2011, 9:25 AM > Yes, It won't work if you are using > OpenOffice. However it works fine > with Microsoft Word. > > Hope it helps. > > Albert > > On 8 April 2011 14:55, Andy <angelf...@yahoo.com> > wrote: > > I can't view the document either -- it showed up > empty. > > > > Has anyone succeeded in viewing it? > > > > Andy > > > > --- On Fri, 4/8/11, Albert Vila <a...@imente.com> > wrote: > > > >> From: Albert Vila <a...@imente.com> > >> Subject: Re: Very very large scale Solr Deployment > = how to do (Expert Question)? > >> To: solr-user@lucene.apache.org > >> Date: Friday, April 8, 2011, 3:43 AM > >> Ephraim, I still can't view the > >> document. > >> > >> Don't know if I'm doing something wrong, but I > downloaded > >> it and It > >> appears to be empty. > >> > >> Albert > >> > >> On 7 April 2011 09:32, Ephraim Ofir <ephra...@icq.com> > >> wrote: > >> > You can't view it online, but you should be > able to > >> download it from: > >> > https://docs.google.com/leaf?id=0BwOEbnJ7oeOrNmU5ZThjODUtYzM5MS00YjRlLWI > >> > > 2OTktZTEzNDk1YmVmOWU4&hl=en&authkey=COGel4gP > >> > > >> > Enjoy, > >> > Ephraim Ofir > >> > > >> > > >> > -----Original Message----- > >> > From: Jens Mueller [mailto:supidupi...@googlemail.com] > >> > Sent: Thursday, April 07, 2011 8:30 AM > >> > To: solr-user@lucene.apache.org > >> > Subject: Re: Very very large scale Solr > Deployment = > >> how to do (Expert > >> > Question)? > >> > > >> > Hello Ephraim, hello Lance, hello Walter, > >> > > >> > thanks for your replies: > >> > > >> > Ephraim, thanks very much for the further > detailed > >> explanation. I will > >> > try > >> > to setup a demo system in the next few days > and use > >> your advice. > >> > LoadBalancers are an important aspect of your > design. > >> Can you recommend > >> > one > >> > LB specificallly? (I would be using > haproxy.1wt.eu) . > >> I think the Idea > >> > with > >> > uploading your document is very good. > However > >> Google-Docs seemed not be > >> > be > >> > working (at least for me with the docx > format?), but > >> maybe you can > >> > simply > >> > output the document as PDF and then I think > Google > >> Docs is working, so > >> > all > >> > the others can also have a look at your > concept. The > >> best approach would > >> > be > >> > if you could upload your advice directly > somewhere to > >> the solr wiki as > >> > it is > >> > really helpful.I found some other documents > meanwhile, > >> but yours is much > >> > clearer and more complete, with the LBs and > the > >> Aggregators ( > >> > http://lucene-eurocon.org/slides/Solr-In-The-Cloud_Mark-Miller.pdf) > >> > > >> > Lance, thanks I will have a look at what > linkedin is > >> doing. > >> > > >> > Walter, thanks for the advice: Well you are > right, > >> mentioning google. My > >> > question was also to understand how such > large systems > >> like > >> > google/facebook > >> > are actually working. So my numbers are just > >> theoretical and made up. My > >> > system will be smaller, but I would be very > happy to > >> understand how > >> > such > >> > large systems are build and I think the > approach > >> Ephraim showd should be > >> > working quite well at large scale. If you > know a good > >> documents (besides > >> > the > >> > bigtable research paper that I already know) > that > >> technically describes > >> > how > >> > google is working in detail that would be of > great > >> interest. You seem to > >> > be > >> > working for a company that handles large > datasets. > >> Does google use this > >> > approach, sharing the index into N writers, > and the > >> procuded index is > >> > then > >> > replicated to N "read only searchers"? > >> > > >> > thank you all. > >> > best regards > >> > jens > >> > > >> > > >> > > >> > 2011/4/7 Walter Underwood <wun...@wunderwood.org> > >> > > >> >> The bigger answer is that you cannot get > to this > >> size by just > >> > configuring > >> >> Solr. You may have to invent a lot of > stuff. Like > >> all of Google. > >> >> > >> >> Where did you get these numbers? The > proposed > >> query rate is twice as > >> > big as > >> >> Google (Feb 2010 estimate, 34K qps). > >> >> > >> >> I work at MarkLogic, and we scale to > 100's of > >> terabytes, with fast > >> > update > >> >> and query rates. If you want a real > system that > >> handles that, you > >> > might want > >> >> to look at our product. > >> >> > >> >> wunder > >> >> > >> >> On Apr 6, 2011, at 8:06 PM, Lance Norskog > wrote: > >> >> > >> >> > I would not use replication. > LinkedIn > >> consumer search is a flat > >> > system > >> >> > where one process indexes new > entries and > >> does queries > >> > simultaneously. > >> >> > It's a custom Lucene app called > Zoie. Their > >> stuff is on Github.. > >> >> > > >> >> > I would get documents to indexers > via a > >> multicast IP-based queueing > >> >> > system. This scales very well and > there's a > >> lot of hardware support. > >> >> > > >> >> > The problem with distributed search > is that > >> it is a) inherently > >> > slower > >> >> > and b) has inherently more and > longer jitter. > >> The "airplane wing" > >> >> > distribution of query times becomes > longer > >> and flatter. > >> >> > > >> >> > This is going to have to be a > "federated" > >> system, where the > >> > front-end > >> >> > app aggregates results rather than > Solr. > >> >> > > >> >> > On Mon, Apr 4, 2011 at 6:25 PM, Jens > Mueller > >> > <supidupi...@googlemail.com> > >> >> wrote: > >> >> >> Hello Experts, > >> >> >> > >> >> >> > >> >> >> > >> >> >> I am a Solr newbie but read > quite a lot > >> of docs. I still do not > >> >> understand > >> >> >> what would be the best way to > setup very > >> large scale deployments: > >> >> >> > >> >> >> > >> >> >> > >> >> >> Goal (threoretical): > >> >> >> > >> >> >> A.) Index-Size: 1 Petabyte (1 > Document > >> is about 5 KB in Size) > >> >> >> > >> >> >> B) Queries: 100000 Queries/ > per Second > >> >> >> > >> >> >> C) Updates: 100000 Updates / > per > >> Second > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> Solr offers: > >> >> >> > >> >> >> 1.) Replication => > Scales Well > >> for B) BUT A) and C) are not > >> >> satisfied > >> >> >> > >> >> >> > >> >> >> 2.) Sharding => Scales > well for > >> A) BUT B) and C) are not > >> > satisfied > >> >> (=> As > >> >> >> I understand the Sharding > approach all > >> goes through a central > >> > server, > >> >> that > >> >> >> dispatches the updates and > assembles the > >> quries retrieved from the > >> >> different > >> >> >> shards. But this central server > has also > >> some capacity limits...) > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> What is the right approach to > handle such > >> large deployments? I > >> > would be > >> >> >> thankfull for just a rough > sketch of the > >> concepts so I can > >> >> experiment/search > >> >> >> further... > >> >> >> > >> >> >> > >> >> >> Maybe I am missing something > very trivial > >> as I think some of the > >> > "Solr > >> >> >> Users/Use Cases" on the homepage > are that > >> kind of large > >> > deployments. How > >> >> are > >> >> >> they implemented? > >> >> >> > >> >> >> > >> >> >> > >> >> >> Thanky very much!!! > >> >> >> > >> >> >> Jens > >> >> >> > >> >> > > >> >> > >> >> > >> >> > >> >> > >> >> > >> > > >> > >> > >> > >> -- > >> Albert Vila Puig > >> <a...@imente.com> > >> iMente.com <http://www.imente.com> > >> > > > > > > -- > Albert Vila Puig > <a...@imente.com> > iMente.com <http://www.imente.com> >