Re: Very very large scale Solr Deployment = how to do (Expert Question)?

Albert Vila Fri, 08 Apr 2011 00:44:22 -0700

Ephraim, I still can't view the document.

Don't know if I'm doing something wrong, but I downloaded it and It
appears to be empty.


Albert

On 7 April 2011 09:32, Ephraim Ofir <ephra...@icq.com> wrote:
> You can't view it online, but you should be able to download it from:
> https://docs.google.com/leaf?id=0BwOEbnJ7oeOrNmU5ZThjODUtYzM5MS00YjRlLWI
> 2OTktZTEzNDk1YmVmOWU4&hl=en&authkey=COGel4gP
>
> Enjoy,
> Ephraim Ofir
>
>
> -----Original Message-----
> From: Jens Mueller [mailto:supidupi...@googlemail.com]
> Sent: Thursday, April 07, 2011 8:30 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Very very large scale Solr Deployment = how to do (Expert
> Question)?
>
> Hello Ephraim, hello Lance, hello Walter,
>
> thanks for your replies:
>
> Ephraim, thanks very much for the further detailed explanation. I will
> try
> to setup a demo system in the next few days and use your advice.
> LoadBalancers are an important aspect of your design. Can you recommend
> one
> LB specificallly? (I would be using haproxy.1wt.eu) . I think the Idea
> with
> uploading your document is very good. However Google-Docs seemed not be
> be
> working (at least for me with the docx format?), but maybe you can
> simply
> output the document as PDF and then I think Google Docs is working, so
> all
> the others can also have a look at your concept. The best approach would
> be
> if you could upload your advice directly somewhere to the solr wiki as
> it is
> really helpful.I found some other documents meanwhile, but yours is much
> clearer and more complete, with the LBs and the Aggregators (
> http://lucene-eurocon.org/slides/Solr-In-The-Cloud_Mark-Miller.pdf)
>
> Lance, thanks I will have a look at what linkedin is doing.
>
> Walter, thanks for the advice: Well you are right, mentioning google. My
> question was also to understand how such large systems like
> google/facebook
> are actually working. So my numbers are just theoretical and made up. My
> system will be smaller,  but I would be very happy to understand how
> such
> large systems are build and I think the approach Ephraim showd should be
> working quite well at large scale. If you know a good documents (besides
> the
> bigtable research paper that I already know) that technically describes
> how
> google is working in detail that would be of great interest. You seem to
> be
> working for a company that handles large datasets. Does google use this
> approach, sharing the index into N writers, and the procuded index is
> then
> replicated to N "read only searchers"?
>
> thank you all.
> best regards
> jens
>
>
>
> 2011/4/7 Walter Underwood <wun...@wunderwood.org>
>
>> The bigger answer is that you cannot get to this size by just
> configuring
>> Solr. You may have to invent a lot of stuff. Like all of Google.
>>
>> Where did you get these numbers? The proposed query rate is twice as
> big as
>> Google (Feb 2010 estimate, 34K qps).
>>
>> I work at MarkLogic, and we scale to 100's of terabytes, with fast
> update
>> and query rates. If you want a real system that handles that, you
> might want
>> to look at our product.
>>
>> wunder
>>
>> On Apr 6, 2011, at 8:06 PM, Lance Norskog wrote:
>>
>> > I would not use replication. LinkedIn consumer search is a flat
> system
>> > where one process indexes new entries and does queries
> simultaneously.
>> > It's a custom Lucene app called Zoie. Their stuff is on Github..
>> >
>> > I would get documents to indexers via a multicast IP-based queueing
>> > system. This scales very well and there's a lot of hardware support.
>> >
>> > The problem with distributed search is that it is a) inherently
> slower
>> > and b) has inherently more and longer jitter. The "airplane wing"
>> > distribution of query times becomes longer and flatter.
>> >
>> > This is going to have to be a "federated" system, where the
> front-end
>> > app aggregates results rather than Solr.
>> >
>> > On Mon, Apr 4, 2011 at 6:25 PM, Jens Mueller
> <supidupi...@googlemail.com>
>> wrote:
>> >> Hello Experts,
>> >>
>> >>
>> >>
>> >> I am a Solr newbie but read quite a lot of docs. I still do not
>> understand
>> >> what would be the best way to setup very large scale deployments:
>> >>
>> >>
>> >>
>> >> Goal (threoretical):
>> >>
>> >>  A.) Index-Size: 1 Petabyte (1 Document is about 5 KB in Size)
>> >>
>> >>  B) Queries: 100000 Queries/ per Second
>> >>
>> >>  C) Updates: 100000 Updates / per Second
>> >>
>> >>
>> >>
>> >>
>> >> Solr offers:
>> >>
>> >> 1.)    Replication => Scales Well for B)  BUT  A) and C) are not
>> satisfied
>> >>
>> >>
>> >> 2.)    Sharding => Scales well for A) BUT B) and C) are not
> satisfied
>> (=> As
>> >> I understand the Sharding approach all goes through a central
> server,
>> that
>> >> dispatches the updates and assembles the quries retrieved from the
>> different
>> >> shards. But this central server has also some capacity limits...)
>> >>
>> >>
>> >>
>> >>
>> >> What is the right approach to handle such large deployments? I
> would be
>> >> thankfull for just a rough sketch of the concepts so I can
>> experiment/search
>> >> further...
>> >>
>> >>
>> >> Maybe I am missing something very trivial as I think some of the
> "Solr
>> >> Users/Use Cases" on the homepage are that kind of large
> deployments. How
>> are
>> >> they implemented?
>> >>
>> >>
>> >>
>> >> Thanky very much!!!
>> >>
>> >> Jens
>> >>
>> >
>>
>>
>>
>>
>>
>



-- 
Albert Vila Puig
<a...@imente.com>
iMente.com <http://www.imente.com>

Re: Very very large scale Solr Deployment = how to do (Expert Question)?

Reply via email to