Re: Very very large scale Solr Deployment = how to do (Expert Question)?

Andy Fri, 08 Apr 2011 06:40:59 -0700

Could anyone please post a version of the document in pdf or openoffice format? 
I'm on Linux so there's no way for me to use MS Word.


Thanks.


--- On Fri, 4/8/11, Albert Vila <a...@imente.com> wrote:

> From: Albert Vila <a...@imente.com>
> Subject: Re: Very very large scale Solr Deployment = how to do (Expert 
> Question)?
> To: solr-user@lucene.apache.org
> Date: Friday, April 8, 2011, 9:25 AM
> Yes, It won't work if you are using
> OpenOffice. However it works fine
> with Microsoft Word.
> 
> Hope it helps.
> 
> Albert
> 
> On 8 April 2011 14:55, Andy <angelf...@yahoo.com>
> wrote:
> > I can't view the document either -- it showed up
> empty.
> >
> > Has anyone succeeded in viewing it?
> >
> > Andy
> >
> > --- On Fri, 4/8/11, Albert Vila <a...@imente.com>
> wrote:
> >
> >> From: Albert Vila <a...@imente.com>
> >> Subject: Re: Very very large scale Solr Deployment
> = how to do (Expert Question)?
> >> To: solr-user@lucene.apache.org
> >> Date: Friday, April 8, 2011, 3:43 AM
> >> Ephraim, I still can't view the
> >> document.
> >>
> >> Don't know if I'm doing something wrong, but I
> downloaded
> >> it and It
> >> appears to be empty.
> >>
> >> Albert
> >>
> >> On 7 April 2011 09:32, Ephraim Ofir <ephra...@icq.com>
> >> wrote:
> >> > You can't view it online, but you should be
> able to
> >> download it from:
> >> > https://docs.google.com/leaf?id=0BwOEbnJ7oeOrNmU5ZThjODUtYzM5MS00YjRlLWI
> >> >
> 2OTktZTEzNDk1YmVmOWU4&hl=en&authkey=COGel4gP
> >> >
> >> > Enjoy,
> >> > Ephraim Ofir
> >> >
> >> >
> >> > -----Original Message-----
> >> > From: Jens Mueller [mailto:supidupi...@googlemail.com]
> >> > Sent: Thursday, April 07, 2011 8:30 AM
> >> > To: solr-user@lucene.apache.org
> >> > Subject: Re: Very very large scale Solr
> Deployment =
> >> how to do (Expert
> >> > Question)?
> >> >
> >> > Hello Ephraim, hello Lance, hello Walter,
> >> >
> >> > thanks for your replies:
> >> >
> >> > Ephraim, thanks very much for the further
> detailed
> >> explanation. I will
> >> > try
> >> > to setup a demo system in the next few days
> and use
> >> your advice.
> >> > LoadBalancers are an important aspect of your
> design.
> >> Can you recommend
> >> > one
> >> > LB specificallly? (I would be using
> haproxy.1wt.eu) .
> >> I think the Idea
> >> > with
> >> > uploading your document is very good.
> However
> >> Google-Docs seemed not be
> >> > be
> >> > working (at least for me with the docx
> format?), but
> >> maybe you can
> >> > simply
> >> > output the document as PDF and then I think
> Google
> >> Docs is working, so
> >> > all
> >> > the others can also have a look at your
> concept. The
> >> best approach would
> >> > be
> >> > if you could upload your advice directly
> somewhere to
> >> the solr wiki as
> >> > it is
> >> > really helpful.I found some other documents
> meanwhile,
> >> but yours is much
> >> > clearer and more complete, with the LBs and
> the
> >> Aggregators (
> >> > http://lucene-eurocon.org/slides/Solr-In-The-Cloud_Mark-Miller.pdf)
> >> >
> >> > Lance, thanks I will have a look at what
> linkedin is
> >> doing.
> >> >
> >> > Walter, thanks for the advice: Well you are
> right,
> >> mentioning google. My
> >> > question was also to understand how such
> large systems
> >> like
> >> > google/facebook
> >> > are actually working. So my numbers are just
> >> theoretical and made up. My
> >> > system will be smaller,  but I would be very
> happy to
> >> understand how
> >> > such
> >> > large systems are build and I think the
> approach
> >> Ephraim showd should be
> >> > working quite well at large scale. If you
> know a good
> >> documents (besides
> >> > the
> >> > bigtable research paper that I already know)
> that
> >> technically describes
> >> > how
> >> > google is working in detail that would be of
> great
> >> interest. You seem to
> >> > be
> >> > working for a company that handles large
> datasets.
> >> Does google use this
> >> > approach, sharing the index into N writers,
> and the
> >> procuded index is
> >> > then
> >> > replicated to N "read only searchers"?
> >> >
> >> > thank you all.
> >> > best regards
> >> > jens
> >> >
> >> >
> >> >
> >> > 2011/4/7 Walter Underwood <wun...@wunderwood.org>
> >> >
> >> >> The bigger answer is that you cannot get
> to this
> >> size by just
> >> > configuring
> >> >> Solr. You may have to invent a lot of
> stuff. Like
> >> all of Google.
> >> >>
> >> >> Where did you get these numbers? The
> proposed
> >> query rate is twice as
> >> > big as
> >> >> Google (Feb 2010 estimate, 34K qps).
> >> >>
> >> >> I work at MarkLogic, and we scale to
> 100's of
> >> terabytes, with fast
> >> > update
> >> >> and query rates. If you want a real
> system that
> >> handles that, you
> >> > might want
> >> >> to look at our product.
> >> >>
> >> >> wunder
> >> >>
> >> >> On Apr 6, 2011, at 8:06 PM, Lance Norskog
> wrote:
> >> >>
> >> >> > I would not use replication.
> LinkedIn
> >> consumer search is a flat
> >> > system
> >> >> > where one process indexes new
> entries and
> >> does queries
> >> > simultaneously.
> >> >> > It's a custom Lucene app called
> Zoie. Their
> >> stuff is on Github..
> >> >> >
> >> >> > I would get documents to indexers
> via a
> >> multicast IP-based queueing
> >> >> > system. This scales very well and
> there's a
> >> lot of hardware support.
> >> >> >
> >> >> > The problem with distributed search
> is that
> >> it is a) inherently
> >> > slower
> >> >> > and b) has inherently more and
> longer jitter.
> >> The "airplane wing"
> >> >> > distribution of query times becomes
> longer
> >> and flatter.
> >> >> >
> >> >> > This is going to have to be a
> "federated"
> >> system, where the
> >> > front-end
> >> >> > app aggregates results rather than
> Solr.
> >> >> >
> >> >> > On Mon, Apr 4, 2011 at 6:25 PM, Jens
> Mueller
> >> > <supidupi...@googlemail.com>
> >> >> wrote:
> >> >> >> Hello Experts,
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> I am a Solr newbie but read
> quite a lot
> >> of docs. I still do not
> >> >> understand
> >> >> >> what would be the best way to
> setup very
> >> large scale deployments:
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> Goal (threoretical):
> >> >> >>
> >> >> >>  A.) Index-Size: 1 Petabyte (1
> Document
> >> is about 5 KB in Size)
> >> >> >>
> >> >> >>  B) Queries: 100000 Queries/
> per Second
> >> >> >>
> >> >> >>  C) Updates: 100000 Updates /
> per
> >> Second
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> Solr offers:
> >> >> >>
> >> >> >> 1.)    Replication =>
> Scales Well
> >> for B)  BUT  A) and C) are not
> >> >> satisfied
> >> >> >>
> >> >> >>
> >> >> >> 2.)    Sharding => Scales
> well for
> >> A) BUT B) and C) are not
> >> > satisfied
> >> >> (=> As
> >> >> >> I understand the Sharding
> approach all
> >> goes through a central
> >> > server,
> >> >> that
> >> >> >> dispatches the updates and
> assembles the
> >> quries retrieved from the
> >> >> different
> >> >> >> shards. But this central server
> has also
> >> some capacity limits...)
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> What is the right approach to
> handle such
> >> large deployments? I
> >> > would be
> >> >> >> thankfull for just a rough
> sketch of the
> >> concepts so I can
> >> >> experiment/search
> >> >> >> further...
> >> >> >>
> >> >> >>
> >> >> >> Maybe I am missing something
> very trivial
> >> as I think some of the
> >> > "Solr
> >> >> >> Users/Use Cases" on the homepage
> are that
> >> kind of large
> >> > deployments. How
> >> >> are
> >> >> >> they implemented?
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> Thanky very much!!!
> >> >> >>
> >> >> >> Jens
> >> >> >>
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >
> >>
> >>
> >>
> >> --
> >> Albert Vila Puig
> >> <a...@imente.com>
> >> iMente.com <http://www.imente.com>
> >>
> >
> 
> 
> 
> -- 
> Albert Vila Puig
> <a...@imente.com>
> iMente.com <http://www.imente.com>
>

Re: Very very large scale Solr Deployment = how to do (Expert Question)?

Reply via email to