Re: Very very large scale Solr Deployment = how to do (Expert Question)?

Albert Vila Fri, 08 Apr 2011 06:26:24 -0700

Yes, It won't work if you are using OpenOffice. However it works fine
with Microsoft Word.


Hope it helps.

Albert

On 8 April 2011 14:55, Andy <angelf...@yahoo.com> wrote:
> I can't view the document either -- it showed up empty.
>
> Has anyone succeeded in viewing it?
>
> Andy
>
> --- On Fri, 4/8/11, Albert Vila <a...@imente.com> wrote:
>
>> From: Albert Vila <a...@imente.com>
>> Subject: Re: Very very large scale Solr Deployment = how to do (Expert 
>> Question)?
>> To: solr-user@lucene.apache.org
>> Date: Friday, April 8, 2011, 3:43 AM
>> Ephraim, I still can't view the
>> document.
>>
>> Don't know if I'm doing something wrong, but I downloaded
>> it and It
>> appears to be empty.
>>
>> Albert
>>
>> On 7 April 2011 09:32, Ephraim Ofir <ephra...@icq.com>
>> wrote:
>> > You can't view it online, but you should be able to
>> download it from:
>> > https://docs.google.com/leaf?id=0BwOEbnJ7oeOrNmU5ZThjODUtYzM5MS00YjRlLWI
>> > 2OTktZTEzNDk1YmVmOWU4&hl=en&authkey=COGel4gP
>> >
>> > Enjoy,
>> > Ephraim Ofir
>> >
>> >
>> > -----Original Message-----
>> > From: Jens Mueller [mailto:supidupi...@googlemail.com]
>> > Sent: Thursday, April 07, 2011 8:30 AM
>> > To: solr-user@lucene.apache.org
>> > Subject: Re: Very very large scale Solr Deployment =
>> how to do (Expert
>> > Question)?
>> >
>> > Hello Ephraim, hello Lance, hello Walter,
>> >
>> > thanks for your replies:
>> >
>> > Ephraim, thanks very much for the further detailed
>> explanation. I will
>> > try
>> > to setup a demo system in the next few days and use
>> your advice.
>> > LoadBalancers are an important aspect of your design.
>> Can you recommend
>> > one
>> > LB specificallly? (I would be using haproxy.1wt.eu) .
>> I think the Idea
>> > with
>> > uploading your document is very good. However
>> Google-Docs seemed not be
>> > be
>> > working (at least for me with the docx format?), but
>> maybe you can
>> > simply
>> > output the document as PDF and then I think Google
>> Docs is working, so
>> > all
>> > the others can also have a look at your concept. The
>> best approach would
>> > be
>> > if you could upload your advice directly somewhere to
>> the solr wiki as
>> > it is
>> > really helpful.I found some other documents meanwhile,
>> but yours is much
>> > clearer and more complete, with the LBs and the
>> Aggregators (
>> > http://lucene-eurocon.org/slides/Solr-In-The-Cloud_Mark-Miller.pdf)
>> >
>> > Lance, thanks I will have a look at what linkedin is
>> doing.
>> >
>> > Walter, thanks for the advice: Well you are right,
>> mentioning google. My
>> > question was also to understand how such large systems
>> like
>> > google/facebook
>> > are actually working. So my numbers are just
>> theoretical and made up. My
>> > system will be smaller,  but I would be very happy to
>> understand how
>> > such
>> > large systems are build and I think the approach
>> Ephraim showd should be
>> > working quite well at large scale. If you know a good
>> documents (besides
>> > the
>> > bigtable research paper that I already know) that
>> technically describes
>> > how
>> > google is working in detail that would be of great
>> interest. You seem to
>> > be
>> > working for a company that handles large datasets.
>> Does google use this
>> > approach, sharing the index into N writers, and the
>> procuded index is
>> > then
>> > replicated to N "read only searchers"?
>> >
>> > thank you all.
>> > best regards
>> > jens
>> >
>> >
>> >
>> > 2011/4/7 Walter Underwood <wun...@wunderwood.org>
>> >
>> >> The bigger answer is that you cannot get to this
>> size by just
>> > configuring
>> >> Solr. You may have to invent a lot of stuff. Like
>> all of Google.
>> >>
>> >> Where did you get these numbers? The proposed
>> query rate is twice as
>> > big as
>> >> Google (Feb 2010 estimate, 34K qps).
>> >>
>> >> I work at MarkLogic, and we scale to 100's of
>> terabytes, with fast
>> > update
>> >> and query rates. If you want a real system that
>> handles that, you
>> > might want
>> >> to look at our product.
>> >>
>> >> wunder
>> >>
>> >> On Apr 6, 2011, at 8:06 PM, Lance Norskog wrote:
>> >>
>> >> > I would not use replication. LinkedIn
>> consumer search is a flat
>> > system
>> >> > where one process indexes new entries and
>> does queries
>> > simultaneously.
>> >> > It's a custom Lucene app called Zoie. Their
>> stuff is on Github..
>> >> >
>> >> > I would get documents to indexers via a
>> multicast IP-based queueing
>> >> > system. This scales very well and there's a
>> lot of hardware support.
>> >> >
>> >> > The problem with distributed search is that
>> it is a) inherently
>> > slower
>> >> > and b) has inherently more and longer jitter.
>> The "airplane wing"
>> >> > distribution of query times becomes longer
>> and flatter.
>> >> >
>> >> > This is going to have to be a "federated"
>> system, where the
>> > front-end
>> >> > app aggregates results rather than Solr.
>> >> >
>> >> > On Mon, Apr 4, 2011 at 6:25 PM, Jens Mueller
>> > <supidupi...@googlemail.com>
>> >> wrote:
>> >> >> Hello Experts,
>> >> >>
>> >> >>
>> >> >>
>> >> >> I am a Solr newbie but read quite a lot
>> of docs. I still do not
>> >> understand
>> >> >> what would be the best way to setup very
>> large scale deployments:
>> >> >>
>> >> >>
>> >> >>
>> >> >> Goal (threoretical):
>> >> >>
>> >> >>  A.) Index-Size: 1 Petabyte (1 Document
>> is about 5 KB in Size)
>> >> >>
>> >> >>  B) Queries: 100000 Queries/ per Second
>> >> >>
>> >> >>  C) Updates: 100000 Updates / per
>> Second
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> Solr offers:
>> >> >>
>> >> >> 1.)    Replication => Scales Well
>> for B)  BUT  A) and C) are not
>> >> satisfied
>> >> >>
>> >> >>
>> >> >> 2.)    Sharding => Scales well for
>> A) BUT B) and C) are not
>> > satisfied
>> >> (=> As
>> >> >> I understand the Sharding approach all
>> goes through a central
>> > server,
>> >> that
>> >> >> dispatches the updates and assembles the
>> quries retrieved from the
>> >> different
>> >> >> shards. But this central server has also
>> some capacity limits...)
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> What is the right approach to handle such
>> large deployments? I
>> > would be
>> >> >> thankfull for just a rough sketch of the
>> concepts so I can
>> >> experiment/search
>> >> >> further...
>> >> >>
>> >> >>
>> >> >> Maybe I am missing something very trivial
>> as I think some of the
>> > "Solr
>> >> >> Users/Use Cases" on the homepage are that
>> kind of large
>> > deployments. How
>> >> are
>> >> >> they implemented?
>> >> >>
>> >> >>
>> >> >>
>> >> >> Thanky very much!!!
>> >> >>
>> >> >> Jens
>> >> >>
>> >> >
>> >>
>> >>
>> >>
>> >>
>> >>
>> >
>>
>>
>>
>> --
>> Albert Vila Puig
>> <a...@imente.com>
>> iMente.com <http://www.imente.com>
>>
>



-- 
Albert Vila Puig
<a...@imente.com>
iMente.com <http://www.imente.com>

Re: Very very large scale Solr Deployment = how to do (Expert Question)?

Reply via email to