Re: Very very large scale Solr Deployment = how to do (Expert Question)?

Andy Fri, 08 Apr 2011 05:55:36 -0700

I can't view the document either -- it showed up empty.

Has anyone succeeded in viewing it?


Andy

--- On Fri, 4/8/11, Albert Vila <a...@imente.com> wrote:

> From: Albert Vila <a...@imente.com>
> Subject: Re: Very very large scale Solr Deployment = how to do (Expert 
> Question)?
> To: solr-user@lucene.apache.org
> Date: Friday, April 8, 2011, 3:43 AM
> Ephraim, I still can't view the
> document.
> 
> Don't know if I'm doing something wrong, but I downloaded
> it and It
> appears to be empty.
> 
> Albert
> 
> On 7 April 2011 09:32, Ephraim Ofir <ephra...@icq.com>
> wrote:
> > You can't view it online, but you should be able to
> download it from:
> > https://docs.google.com/leaf?id=0BwOEbnJ7oeOrNmU5ZThjODUtYzM5MS00YjRlLWI
> > 2OTktZTEzNDk1YmVmOWU4&hl=en&authkey=COGel4gP
> >
> > Enjoy,
> > Ephraim Ofir
> >
> >
> > -----Original Message-----
> > From: Jens Mueller [mailto:supidupi...@googlemail.com]
> > Sent: Thursday, April 07, 2011 8:30 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Very very large scale Solr Deployment =
> how to do (Expert
> > Question)?
> >
> > Hello Ephraim, hello Lance, hello Walter,
> >
> > thanks for your replies:
> >
> > Ephraim, thanks very much for the further detailed
> explanation. I will
> > try
> > to setup a demo system in the next few days and use
> your advice.
> > LoadBalancers are an important aspect of your design.
> Can you recommend
> > one
> > LB specificallly? (I would be using haproxy.1wt.eu) .
> I think the Idea
> > with
> > uploading your document is very good. However
> Google-Docs seemed not be
> > be
> > working (at least for me with the docx format?), but
> maybe you can
> > simply
> > output the document as PDF and then I think Google
> Docs is working, so
> > all
> > the others can also have a look at your concept. The
> best approach would
> > be
> > if you could upload your advice directly somewhere to
> the solr wiki as
> > it is
> > really helpful.I found some other documents meanwhile,
> but yours is much
> > clearer and more complete, with the LBs and the
> Aggregators (
> > http://lucene-eurocon.org/slides/Solr-In-The-Cloud_Mark-Miller.pdf)
> >
> > Lance, thanks I will have a look at what linkedin is
> doing.
> >
> > Walter, thanks for the advice: Well you are right,
> mentioning google. My
> > question was also to understand how such large systems
> like
> > google/facebook
> > are actually working. So my numbers are just
> theoretical and made up. My
> > system will be smaller,  but I would be very happy to
> understand how
> > such
> > large systems are build and I think the approach
> Ephraim showd should be
> > working quite well at large scale. If you know a good
> documents (besides
> > the
> > bigtable research paper that I already know) that
> technically describes
> > how
> > google is working in detail that would be of great
> interest. You seem to
> > be
> > working for a company that handles large datasets.
> Does google use this
> > approach, sharing the index into N writers, and the
> procuded index is
> > then
> > replicated to N "read only searchers"?
> >
> > thank you all.
> > best regards
> > jens
> >
> >
> >
> > 2011/4/7 Walter Underwood <wun...@wunderwood.org>
> >
> >> The bigger answer is that you cannot get to this
> size by just
> > configuring
> >> Solr. You may have to invent a lot of stuff. Like
> all of Google.
> >>
> >> Where did you get these numbers? The proposed
> query rate is twice as
> > big as
> >> Google (Feb 2010 estimate, 34K qps).
> >>
> >> I work at MarkLogic, and we scale to 100's of
> terabytes, with fast
> > update
> >> and query rates. If you want a real system that
> handles that, you
> > might want
> >> to look at our product.
> >>
> >> wunder
> >>
> >> On Apr 6, 2011, at 8:06 PM, Lance Norskog wrote:
> >>
> >> > I would not use replication. LinkedIn
> consumer search is a flat
> > system
> >> > where one process indexes new entries and
> does queries
> > simultaneously.
> >> > It's a custom Lucene app called Zoie. Their
> stuff is on Github..
> >> >
> >> > I would get documents to indexers via a
> multicast IP-based queueing
> >> > system. This scales very well and there's a
> lot of hardware support.
> >> >
> >> > The problem with distributed search is that
> it is a) inherently
> > slower
> >> > and b) has inherently more and longer jitter.
> The "airplane wing"
> >> > distribution of query times becomes longer
> and flatter.
> >> >
> >> > This is going to have to be a "federated"
> system, where the
> > front-end
> >> > app aggregates results rather than Solr.
> >> >
> >> > On Mon, Apr 4, 2011 at 6:25 PM, Jens Mueller
> > <supidupi...@googlemail.com>
> >> wrote:
> >> >> Hello Experts,
> >> >>
> >> >>
> >> >>
> >> >> I am a Solr newbie but read quite a lot
> of docs. I still do not
> >> understand
> >> >> what would be the best way to setup very
> large scale deployments:
> >> >>
> >> >>
> >> >>
> >> >> Goal (threoretical):
> >> >>
> >> >>  A.) Index-Size: 1 Petabyte (1 Document
> is about 5 KB in Size)
> >> >>
> >> >>  B) Queries: 100000 Queries/ per Second
> >> >>
> >> >>  C) Updates: 100000 Updates / per
> Second
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> Solr offers:
> >> >>
> >> >> 1.)    Replication => Scales Well
> for B)  BUT  A) and C) are not
> >> satisfied
> >> >>
> >> >>
> >> >> 2.)    Sharding => Scales well for
> A) BUT B) and C) are not
> > satisfied
> >> (=> As
> >> >> I understand the Sharding approach all
> goes through a central
> > server,
> >> that
> >> >> dispatches the updates and assembles the
> quries retrieved from the
> >> different
> >> >> shards. But this central server has also
> some capacity limits...)
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> What is the right approach to handle such
> large deployments? I
> > would be
> >> >> thankfull for just a rough sketch of the
> concepts so I can
> >> experiment/search
> >> >> further...
> >> >>
> >> >>
> >> >> Maybe I am missing something very trivial
> as I think some of the
> > "Solr
> >> >> Users/Use Cases" on the homepage are that
> kind of large
> > deployments. How
> >> are
> >> >> they implemented?
> >> >>
> >> >>
> >> >>
> >> >> Thanky very much!!!
> >> >>
> >> >> Jens
> >> >>
> >> >
> >>
> >>
> >>
> >>
> >>
> >
> 
> 
> 
> -- 
> Albert Vila Puig
> <a...@imente.com>
> iMente.com <http://www.imente.com>
>

Re: Very very large scale Solr Deployment = how to do (Expert Question)?

Reply via email to