Re: Very very large scale Solr Deployment = how to do (Expert Question)?

Andy Fri, 08 Apr 2011 07:44:36 -0700

Perfect. Thank you very much.

Andy


--- On Fri, 4/8/11, Pascal Coupet <pcou...@gmail.com> wrote:

> From: Pascal Coupet <pcou...@gmail.com>
> Subject: Re: Very very large scale Solr Deployment = how to do (Expert 
> Question)?
> To: solr-user@lucene.apache.org
> Date: Friday, April 8, 2011, 10:20 AM
> I dit put a pdf version here:
> https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B02DHBZQYYT_MmRkZTY0YjQtODJmZS00Mzg0LWJiNTEtOWJjNzViNmNjZjdh&hl=en&authkey=CL2Fq_QG
> 
> Zoom it to get a better view.
> 
> Pascal
> 
> 2011/4/8 Andy <angelf...@yahoo.com>
> 
> > Could anyone please post a version of the document in
> pdf or openoffice
> > format? I'm on Linux so there's no way for me to use
> MS Word.
> >
> > Thanks.
> >
> >
> > --- On Fri, 4/8/11, Albert Vila <a...@imente.com>
> wrote:
> >
> > > From: Albert Vila <a...@imente.com>
> > > Subject: Re: Very very large scale Solr
> Deployment = how to do (Expert
> > Question)?
> > > To: solr-user@lucene.apache.org
> > > Date: Friday, April 8, 2011, 9:25 AM
> > > Yes, It won't work if you are using
> > > OpenOffice. However it works fine
> > > with Microsoft Word.
> > >
> > > Hope it helps.
> > >
> > > Albert
> > >
> > > On 8 April 2011 14:55, Andy <angelf...@yahoo.com>
> > > wrote:
> > > > I can't view the document either -- it
> showed up
> > > empty.
> > > >
> > > > Has anyone succeeded in viewing it?
> > > >
> > > > Andy
> > > >
> > > > --- On Fri, 4/8/11, Albert Vila <a...@imente.com>
> > > wrote:
> > > >
> > > >> From: Albert Vila <a...@imente.com>
> > > >> Subject: Re: Very very large scale Solr
> Deployment
> > > = how to do (Expert Question)?
> > > >> To: solr-user@lucene.apache.org
> > > >> Date: Friday, April 8, 2011, 3:43 AM
> > > >> Ephraim, I still can't view the
> > > >> document.
> > > >>
> > > >> Don't know if I'm doing something wrong,
> but I
> > > downloaded
> > > >> it and It
> > > >> appears to be empty.
> > > >>
> > > >> Albert
> > > >>
> > > >> On 7 April 2011 09:32, Ephraim Ofir
> <ephra...@icq.com>
> > > >> wrote:
> > > >> > You can't view it online, but you
> should be
> > > able to
> > > >> download it from:
> > > >> >
> > https://docs.google.com/leaf?id=0BwOEbnJ7oeOrNmU5ZThjODUtYzM5MS00YjRlLWI
> > > >> >
> > >
> 2OTktZTEzNDk1YmVmOWU4&hl=en&authkey=COGel4gP
> > > >> >
> > > >> > Enjoy,
> > > >> > Ephraim Ofir
> > > >> >
> > > >> >
> > > >> > -----Original Message-----
> > > >> > From: Jens Mueller [mailto:supidupi...@googlemail.com]
> > > >> > Sent: Thursday, April 07, 2011 8:30
> AM
> > > >> > To: solr-user@lucene.apache.org
> > > >> > Subject: Re: Very very large scale
> Solr
> > > Deployment =
> > > >> how to do (Expert
> > > >> > Question)?
> > > >> >
> > > >> > Hello Ephraim, hello Lance, hello
> Walter,
> > > >> >
> > > >> > thanks for your replies:
> > > >> >
> > > >> > Ephraim, thanks very much for the
> further
> > > detailed
> > > >> explanation. I will
> > > >> > try
> > > >> > to setup a demo system in the next
> few days
> > > and use
> > > >> your advice.
> > > >> > LoadBalancers are an important
> aspect of your
> > > design.
> > > >> Can you recommend
> > > >> > one
> > > >> > LB specificallly? (I would be
> using
> > > haproxy.1wt.eu) .
> > > >> I think the Idea
> > > >> > with
> > > >> > uploading your document is very
> good.
> > > However
> > > >> Google-Docs seemed not be
> > > >> > be
> > > >> > working (at least for me with the
> docx
> > > format?), but
> > > >> maybe you can
> > > >> > simply
> > > >> > output the document as PDF and then
> I think
> > > Google
> > > >> Docs is working, so
> > > >> > all
> > > >> > the others can also have a look at
> your
> > > concept. The
> > > >> best approach would
> > > >> > be
> > > >> > if you could upload your advice
> directly
> > > somewhere to
> > > >> the solr wiki as
> > > >> > it is
> > > >> > really helpful.I found some other
> documents
> > > meanwhile,
> > > >> but yours is much
> > > >> > clearer and more complete, with the
> LBs and
> > > the
> > > >> Aggregators (
> > > >> > http://lucene-eurocon.org/slides/Solr-In-The-Cloud_Mark-Miller.pdf)
> > > >> >
> > > >> > Lance, thanks I will have a look at
> what
> > > linkedin is
> > > >> doing.
> > > >> >
> > > >> > Walter, thanks for the advice: Well
> you are
> > > right,
> > > >> mentioning google. My
> > > >> > question was also to understand how
> such
> > > large systems
> > > >> like
> > > >> > google/facebook
> > > >> > are actually working. So my numbers
> are just
> > > >> theoretical and made up. My
> > > >> > system will be smaller,  but I
> would be very
> > > happy to
> > > >> understand how
> > > >> > such
> > > >> > large systems are build and I think
> the
> > > approach
> > > >> Ephraim showd should be
> > > >> > working quite well at large scale.
> If you
> > > know a good
> > > >> documents (besides
> > > >> > the
> > > >> > bigtable research paper that I
> already know)
> > > that
> > > >> technically describes
> > > >> > how
> > > >> > google is working in detail that
> would be of
> > > great
> > > >> interest. You seem to
> > > >> > be
> > > >> > working for a company that handles
> large
> > > datasets.
> > > >> Does google use this
> > > >> > approach, sharing the index into N
> writers,
> > > and the
> > > >> procuded index is
> > > >> > then
> > > >> > replicated to N "read only
> searchers"?
> > > >> >
> > > >> > thank you all.
> > > >> > best regards
> > > >> > jens
> > > >> >
> > > >> >
> > > >> >
> > > >> > 2011/4/7 Walter Underwood <wun...@wunderwood.org>
> > > >> >
> > > >> >> The bigger answer is that you
> cannot get
> > > to this
> > > >> size by just
> > > >> > configuring
> > > >> >> Solr. You may have to invent a
> lot of
> > > stuff. Like
> > > >> all of Google.
> > > >> >>
> > > >> >> Where did you get these
> numbers? The
> > > proposed
> > > >> query rate is twice as
> > > >> > big as
> > > >> >> Google (Feb 2010 estimate, 34K
> qps).
> > > >> >>
> > > >> >> I work at MarkLogic, and we
> scale to
> > > 100's of
> > > >> terabytes, with fast
> > > >> > update
> > > >> >> and query rates. If you want a
> real
> > > system that
> > > >> handles that, you
> > > >> > might want
> > > >> >> to look at our product.
> > > >> >>
> > > >> >> wunder
> > > >> >>
> > > >> >> On Apr 6, 2011, at 8:06 PM,
> Lance Norskog
> > > wrote:
> > > >> >>
> > > >> >> > I would not use
> replication.
> > > LinkedIn
> > > >> consumer search is a flat
> > > >> > system
> > > >> >> > where one process indexes
> new
> > > entries and
> > > >> does queries
> > > >> > simultaneously.
> > > >> >> > It's a custom Lucene app
> called
> > > Zoie. Their
> > > >> stuff is on Github..
> > > >> >> >
> > > >> >> > I would get documents to
> indexers
> > > via a
> > > >> multicast IP-based queueing
> > > >> >> > system. This scales very
> well and
> > > there's a
> > > >> lot of hardware support.
> > > >> >> >
> > > >> >> > The problem with
> distributed search
> > > is that
> > > >> it is a) inherently
> > > >> > slower
> > > >> >> > and b) has inherently more
> and
> > > longer jitter.
> > > >> The "airplane wing"
> > > >> >> > distribution of query
> times becomes
> > > longer
> > > >> and flatter.
> > > >> >> >
> > > >> >> > This is going to have to
> be a
> > > "federated"
> > > >> system, where the
> > > >> > front-end
> > > >> >> > app aggregates results
> rather than
> > > Solr.
> > > >> >> >
> > > >> >> > On Mon, Apr 4, 2011 at
> 6:25 PM, Jens
> > > Mueller
> > > >> > <supidupi...@googlemail.com>
> > > >> >> wrote:
> > > >> >> >> Hello Experts,
> > > >> >> >>
> > > >> >> >>
> > > >> >> >>
> > > >> >> >> I am a Solr newbie but
> read
> > > quite a lot
> > > >> of docs. I still do not
> > > >> >> understand
> > > >> >> >> what would be the best
> way to
> > > setup very
> > > >> large scale deployments:
> > > >> >> >>
> > > >> >> >>
> > > >> >> >>
> > > >> >> >> Goal (threoretical):
> > > >> >> >>
> > > >> >> >>  A.) Index-Size:
> 1 Petabyte (1
> > > Document
> > > >> is about 5 KB in Size)
> > > >> >> >>
> > > >> >> >>  B) Queries:
> 100000 Queries/
> > > per Second
> > > >> >> >>
> > > >> >> >>  C) Updates:
> 100000 Updates /
> > > per
> > > >> Second
> > > >> >> >>
> > > >> >> >>
> > > >> >> >>
> > > >> >> >>
> > > >> >> >> Solr offers:
> > > >> >> >>
> > > >> >> >> 1.)   
> Replication =>
> > > Scales Well
> > > >> for B)  BUT  A) and C) are
> not
> > > >> >> satisfied
> > > >> >> >>
> > > >> >> >>
> > > >> >> >> 2.)   
> Sharding => Scales
> > > well for
> > > >> A) BUT B) and C) are not
> > > >> > satisfied
> > > >> >> (=> As
> > > >> >> >> I understand the
> Sharding
> > > approach all
> > > >> goes through a central
> > > >> > server,
> > > >> >> that
> > > >> >> >> dispatches the updates
> and
> > > assembles the
> > > >> quries retrieved from the
> > > >> >> different
> > > >> >> >> shards. But this
> central server
> > > has also
> > > >> some capacity limits...)
> > > >> >> >>
> > > >> >> >>
> > > >> >> >>
> > > >> >> >>
> > > >> >> >> What is the right
> approach to
> > > handle such
> > > >> large deployments? I
> > > >> > would be
> > > >> >> >> thankfull for just a
> rough
> > > sketch of the
> > > >> concepts so I can
> > > >> >> experiment/search
> > > >> >> >> further...
> > > >> >> >>
> > > >> >> >>
> > > >> >> >> Maybe I am missing
> something
> > > very trivial
> > > >> as I think some of the
> > > >> > "Solr
> > > >> >> >> Users/Use Cases" on
> the homepage
> > > are that
> > > >> kind of large
> > > >> > deployments. How
> > > >> >> are
> > > >> >> >> they implemented?
> > > >> >> >>
> > > >> >> >>
> > > >> >> >>
> > > >> >> >> Thanky very much!!!
> > > >> >> >>
> > > >> >> >> Jens
> > > >> >> >>
> > > >> >> >
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Albert Vila Puig
> > > >> <a...@imente.com>
> > > >> iMente.com <http://www.imente.com>
> > > >>
> > > >
> > >
> > >
> > >
> > > --
> > > Albert Vila Puig
> > > <a...@imente.com>
> > > iMente.com <http://www.imente.com>
> > >
> >
>

Re: Very very large scale Solr Deployment = how to do (Expert Question)?

Reply via email to