Re: Very very large scale Solr Deployment = how to do (Expert Question)?

Pascal Coupet Fri, 08 Apr 2011 07:20:34 -0700

I dit put a pdf version here:
https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B02DHBZQYYT_MmRkZTY0YjQtODJmZS00Mzg0LWJiNTEtOWJjNzViNmNjZjdh&hl=en&authkey=CL2Fq_QG


Zoom it to get a better view.

Pascal

2011/4/8 Andy <[email protected]>

> Could anyone please post a version of the document in pdf or openoffice
> format? I'm on Linux so there's no way for me to use MS Word.
>
> Thanks.
>
>
> --- On Fri, 4/8/11, Albert Vila <[email protected]> wrote:
>
> > From: Albert Vila <[email protected]>
> > Subject: Re: Very very large scale Solr Deployment = how to do (Expert
> Question)?
> > To: [email protected]
> > Date: Friday, April 8, 2011, 9:25 AM
> > Yes, It won't work if you are using
> > OpenOffice. However it works fine
> > with Microsoft Word.
> >
> > Hope it helps.
> >
> > Albert
> >
> > On 8 April 2011 14:55, Andy <[email protected]>
> > wrote:
> > > I can't view the document either -- it showed up
> > empty.
> > >
> > > Has anyone succeeded in viewing it?
> > >
> > > Andy
> > >
> > > --- On Fri, 4/8/11, Albert Vila <[email protected]>
> > wrote:
> > >
> > >> From: Albert Vila <[email protected]>
> > >> Subject: Re: Very very large scale Solr Deployment
> > = how to do (Expert Question)?
> > >> To: [email protected]
> > >> Date: Friday, April 8, 2011, 3:43 AM
> > >> Ephraim, I still can't view the
> > >> document.
> > >>
> > >> Don't know if I'm doing something wrong, but I
> > downloaded
> > >> it and It
> > >> appears to be empty.
> > >>
> > >> Albert
> > >>
> > >> On 7 April 2011 09:32, Ephraim Ofir <[email protected]>
> > >> wrote:
> > >> > You can't view it online, but you should be
> > able to
> > >> download it from:
> > >> >
> https://docs.google.com/leaf?id=0BwOEbnJ7oeOrNmU5ZThjODUtYzM5MS00YjRlLWI
> > >> >
> > 2OTktZTEzNDk1YmVmOWU4&hl=en&authkey=COGel4gP
> > >> >
> > >> > Enjoy,
> > >> > Ephraim Ofir
> > >> >
> > >> >
> > >> > -----Original Message-----
> > >> > From: Jens Mueller [mailto:[email protected]]
> > >> > Sent: Thursday, April 07, 2011 8:30 AM
> > >> > To: [email protected]
> > >> > Subject: Re: Very very large scale Solr
> > Deployment =
> > >> how to do (Expert
> > >> > Question)?
> > >> >
> > >> > Hello Ephraim, hello Lance, hello Walter,
> > >> >
> > >> > thanks for your replies:
> > >> >
> > >> > Ephraim, thanks very much for the further
> > detailed
> > >> explanation. I will
> > >> > try
> > >> > to setup a demo system in the next few days
> > and use
> > >> your advice.
> > >> > LoadBalancers are an important aspect of your
> > design.
> > >> Can you recommend
> > >> > one
> > >> > LB specificallly? (I would be using
> > haproxy.1wt.eu) .
> > >> I think the Idea
> > >> > with
> > >> > uploading your document is very good.
> > However
> > >> Google-Docs seemed not be
> > >> > be
> > >> > working (at least for me with the docx
> > format?), but
> > >> maybe you can
> > >> > simply
> > >> > output the document as PDF and then I think
> > Google
> > >> Docs is working, so
> > >> > all
> > >> > the others can also have a look at your
> > concept. The
> > >> best approach would
> > >> > be
> > >> > if you could upload your advice directly
> > somewhere to
> > >> the solr wiki as
> > >> > it is
> > >> > really helpful.I found some other documents
> > meanwhile,
> > >> but yours is much
> > >> > clearer and more complete, with the LBs and
> > the
> > >> Aggregators (
> > >> > http://lucene-eurocon.org/slides/Solr-In-The-Cloud_Mark-Miller.pdf)
> > >> >
> > >> > Lance, thanks I will have a look at what
> > linkedin is
> > >> doing.
> > >> >
> > >> > Walter, thanks for the advice: Well you are
> > right,
> > >> mentioning google. My
> > >> > question was also to understand how such
> > large systems
> > >> like
> > >> > google/facebook
> > >> > are actually working. So my numbers are just
> > >> theoretical and made up. My
> > >> > system will be smaller,  but I would be very
> > happy to
> > >> understand how
> > >> > such
> > >> > large systems are build and I think the
> > approach
> > >> Ephraim showd should be
> > >> > working quite well at large scale. If you
> > know a good
> > >> documents (besides
> > >> > the
> > >> > bigtable research paper that I already know)
> > that
> > >> technically describes
> > >> > how
> > >> > google is working in detail that would be of
> > great
> > >> interest. You seem to
> > >> > be
> > >> > working for a company that handles large
> > datasets.
> > >> Does google use this
> > >> > approach, sharing the index into N writers,
> > and the
> > >> procuded index is
> > >> > then
> > >> > replicated to N "read only searchers"?
> > >> >
> > >> > thank you all.
> > >> > best regards
> > >> > jens
> > >> >
> > >> >
> > >> >
> > >> > 2011/4/7 Walter Underwood <[email protected]>
> > >> >
> > >> >> The bigger answer is that you cannot get
> > to this
> > >> size by just
> > >> > configuring
> > >> >> Solr. You may have to invent a lot of
> > stuff. Like
> > >> all of Google.
> > >> >>
> > >> >> Where did you get these numbers? The
> > proposed
> > >> query rate is twice as
> > >> > big as
> > >> >> Google (Feb 2010 estimate, 34K qps).
> > >> >>
> > >> >> I work at MarkLogic, and we scale to
> > 100's of
> > >> terabytes, with fast
> > >> > update
> > >> >> and query rates. If you want a real
> > system that
> > >> handles that, you
> > >> > might want
> > >> >> to look at our product.
> > >> >>
> > >> >> wunder
> > >> >>
> > >> >> On Apr 6, 2011, at 8:06 PM, Lance Norskog
> > wrote:
> > >> >>
> > >> >> > I would not use replication.
> > LinkedIn
> > >> consumer search is a flat
> > >> > system
> > >> >> > where one process indexes new
> > entries and
> > >> does queries
> > >> > simultaneously.
> > >> >> > It's a custom Lucene app called
> > Zoie. Their
> > >> stuff is on Github..
> > >> >> >
> > >> >> > I would get documents to indexers
> > via a
> > >> multicast IP-based queueing
> > >> >> > system. This scales very well and
> > there's a
> > >> lot of hardware support.
> > >> >> >
> > >> >> > The problem with distributed search
> > is that
> > >> it is a) inherently
> > >> > slower
> > >> >> > and b) has inherently more and
> > longer jitter.
> > >> The "airplane wing"
> > >> >> > distribution of query times becomes
> > longer
> > >> and flatter.
> > >> >> >
> > >> >> > This is going to have to be a
> > "federated"
> > >> system, where the
> > >> > front-end
> > >> >> > app aggregates results rather than
> > Solr.
> > >> >> >
> > >> >> > On Mon, Apr 4, 2011 at 6:25 PM, Jens
> > Mueller
> > >> > <[email protected]>
> > >> >> wrote:
> > >> >> >> Hello Experts,
> > >> >> >>
> > >> >> >>
> > >> >> >>
> > >> >> >> I am a Solr newbie but read
> > quite a lot
> > >> of docs. I still do not
> > >> >> understand
> > >> >> >> what would be the best way to
> > setup very
> > >> large scale deployments:
> > >> >> >>
> > >> >> >>
> > >> >> >>
> > >> >> >> Goal (threoretical):
> > >> >> >>
> > >> >> >>  A.) Index-Size: 1 Petabyte (1
> > Document
> > >> is about 5 KB in Size)
> > >> >> >>
> > >> >> >>  B) Queries: 100000 Queries/
> > per Second
> > >> >> >>
> > >> >> >>  C) Updates: 100000 Updates /
> > per
> > >> Second
> > >> >> >>
> > >> >> >>
> > >> >> >>
> > >> >> >>
> > >> >> >> Solr offers:
> > >> >> >>
> > >> >> >> 1.)    Replication =>
> > Scales Well
> > >> for B)  BUT  A) and C) are not
> > >> >> satisfied
> > >> >> >>
> > >> >> >>
> > >> >> >> 2.)    Sharding => Scales
> > well for
> > >> A) BUT B) and C) are not
> > >> > satisfied
> > >> >> (=> As
> > >> >> >> I understand the Sharding
> > approach all
> > >> goes through a central
> > >> > server,
> > >> >> that
> > >> >> >> dispatches the updates and
> > assembles the
> > >> quries retrieved from the
> > >> >> different
> > >> >> >> shards. But this central server
> > has also
> > >> some capacity limits...)
> > >> >> >>
> > >> >> >>
> > >> >> >>
> > >> >> >>
> > >> >> >> What is the right approach to
> > handle such
> > >> large deployments? I
> > >> > would be
> > >> >> >> thankfull for just a rough
> > sketch of the
> > >> concepts so I can
> > >> >> experiment/search
> > >> >> >> further...
> > >> >> >>
> > >> >> >>
> > >> >> >> Maybe I am missing something
> > very trivial
> > >> as I think some of the
> > >> > "Solr
> > >> >> >> Users/Use Cases" on the homepage
> > are that
> > >> kind of large
> > >> > deployments. How
> > >> >> are
> > >> >> >> they implemented?
> > >> >> >>
> > >> >> >>
> > >> >> >>
> > >> >> >> Thanky very much!!!
> > >> >> >>
> > >> >> >> Jens
> > >> >> >>
> > >> >> >
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Albert Vila Puig
> > >> <[email protected]>
> > >> iMente.com <http://www.imente.com>
> > >>
> > >
> >
> >
> >
> > --
> > Albert Vila Puig
> > <[email protected]>
> > iMente.com <http://www.imente.com>
> >
>

Re: Very very large scale Solr Deployment = how to do (Expert Question)?

Reply via email to