Re: solr on the cloud

Jason Rutherglen Fri, 25 Mar 2011 14:27:00 -0700

Dmitry,

If you're planning on using HBase you can take a look at
https://issues.apache.org/jira/browse/HBASE-3529  I think we may even
have a reasonable solution for reading the index [randomly] out of
HDFS.  Benchmarking'll be implemented next.  It's not production
ready, suggestions are welcome.


Jason

On Fri, Mar 25, 2011 at 2:03 PM, Dmitry Kan <dmitry....@gmail.com> wrote:
> Hi Otis,
>
> Thanks for elaborating on this and the link (funny!).
>
> I have quite a big dataset growing all the time. The problems that I start
> facing are pretty much predictable:
> 1. Scalability: this inludes indexing time (now some days!, better hours or
> even minutes, if that's possible) along with handling the rapid growth
> 2. Robustness: the entire system (distributed or single server or anything
> else) should be fault-tolerant, e.g. if one shard goes down, other catches
> up (master-slave scheme)
> 3. Some apps that we run on SOLR are pretty computationally demanding, like
> faceting over one+bi+trigrams of hundreds of millions of documents (index
> size of half a TB) ---> single server with a shard of data does not seem to
> be enough for realtime search.
>
> This is just for a bit of a background. I agree with you on that hadoop and
> cloud probably best suit massive batch processes rather than realtime
> search. I'm sure, if anyone out there made SOLR shine throught the cloud for
> realtime search over large datasets.
>
> By "SOLR on the cloud (e.g. HDFS + MR +  cloud of
> commodity machines)" I mean what you've done for your customers using EC2.
> Any chance, the guidlines/articles for/on setting indices on HDFS are
> available in some open / paid area?
>
> To sum this up, I didn't mean to create a buzz on the cloud solutions in
> this thread, just was wondering what is practically available / going on in
> SOLR development in this regard.
>
> Thanks,
>
> Dmitry
>
>
> On Fri, Mar 25, 2011 at 10:28 PM, Otis Gospodnetic <
> otis_gospodne...@yahoo.com> wrote:
>
>> Hi Dan,
>>
>> This feels a bit like a buzzword soup.... with mushrooms. :)
>>
>> MR jobs, at least the ones in Hadoopland, are very batch oriented, so that
>> wouldn't be very suitable for most search applications.  There are some
>> technologies like Riak that combine MR and search.  Let me use this funny
>> little
>> link: http://lmgtfy.com/?q=riak%20mapreduce%20search
>>
>>
>> Sure, you can put indices on HDFS (but don't expect searches to be fast).
>>  Sure
>> you can create indices using MapReduce, we've done that successfully for
>> customers bringing long indexing jobs from many hours to minutes by using,
>> yes,
>> a cluster of machines (actually EC2 instances).
>> But when you say "more into SOLR on the cloud (e.g. HDFS + MR +  cloud of
>> commodity machines)", I can't actually picture what precisely you mean...
>>
>>
>> Otis
>> ---
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>> Lucene ecosystem search :: http://search-lucene.com/
>>
>>
>>
>> ----- Original Message ----
>> > From: Dmitry Kan <dmitry....@gmail.com>
>> > To: solr-user@lucene.apache.org
>> > Cc: Upayavira <u...@odoko.co.uk>
>> > Sent: Fri, March 25, 2011 8:26:33 AM
>> > Subject: Re: solr on the cloud
>> >
>> > Hi, Upayavira
>> >
>> > Probably I'm confusing the terms here. When I say  "distributed faceting"
>> I'm
>> > more into SOLR on the cloud (e.g. HDFS + MR +  cloud of commodity
>> machines)
>> > rather than into traditional multicore/sharded  SOLR on a single or
>> multiple
>> > servers with non-distributed file systems (is  that what you mean when
>> you
>> > refer to "distribution of facet requests across  hosts"?)
>> >
>> > On Fri, Mar 25, 2011 at 1:57 PM, Upayavira <u...@odoko.co.uk>  wrote:
>> >
>> > >
>> > >
>> > > On Fri, 25 Mar 2011 13:44 +0200, "Dmitry Kan"  <dmitry....@gmail.com>
>> > >  wrote:
>> > > > Hi Yonik,
>> > > >
>> > > > Oh, this is great. Is  distributed faceting available in the trunk?
>> What
>> > > > is
>> > > >  the basic server setup needed for trying this out, is it cloud with
>> HDFS
>> > >  > and
>> > > > SOLR with zookepers?
>> > > > Any chance to see the  related documentation? :)
>> > >
>> > > Distributed faceting has been  available for a long time, and is
>> > > available in the 1.4.1  release.
>> > >
>> > > The distribution of facet requests across hosts happens  in the
>> > > background. There's no real difference (in query syntax) between  a
>> > > standard facet query and a distributed one.
>> > >
>> > > i.e. you  don't need SolrCloud nor Zookeeper for it. (they may provide
>> > > other  benefits, but you don't need them for distributed faceting).
>> > >
>> > >  Upayavira
>> > >
>> > > > On Fri, Mar 25, 2011 at 1:35 PM, Yonik  Seeley
>> > > > <yo...@lucidimagination.com>wrote:
>> > >  >
>> > > > > On Tue, Mar 22, 2011 at 7:51 AM, Dmitry Kan <dmitry....@gmail.com>
>> > >  wrote:
>> > > > > > Basically, of high interest is checking out the  Map-Reduce for
>> > > > > distributed
>> > > > > > faceting, is  it even possible with the trunk?
>> > > > >
>> > > > > Solr  already has distributed faceting, and it's much more
>> performant
>> > > >  > than a map-reduce implementation would be.
>> > > > >
>> > > >  > I've also seen a product use the term "map reduce" incorrectly...
>>  as
>> > > in,
>> > > > > we "map" the request to each shard, and then  "reduce" the results
>> to a
>> > > > > single list (of course, that's not  actually map-reduce at all ;-)
>> > > > >
>> > > > >
>> > > >  :) this sounds pretty strange to me as well. It was only my guess,
>> that
>> > >  > if
>> > > > you have MR as computational model and a cloud beneath it,  you could
>> > > > naturally map facet fields to their counts inside single  documents
>> (no
>> > > > matter, where they are, be it shards or "single"  index) and pass
>> them
>> > > > onto
>> > > > reducers.
>> > >  >
>> > > >
>> > > > > -Yonik
>> > > > > http://www.lucenerevolution.org -- Lucene/Solr User Conference,
>> May
>> > >  > > 25-26, San Francisco
>> > > > >
>> > > >
>> > >  >
>> > > >
>> > > > --
>> > > > Regards,
>> > > >
>> > >  > Dmitry Kan
>> > > >
>> > > ---
>> > > Enterprise Search Consultant at  Sourcesense UK,
>> > > Making Sense of Open  Source
>> > >
>> > >
>> >
>> >
>> > --
>> > Regards,
>> >
>> > Dmitry Kan
>> >
>>
>

Re: solr on the cloud

Reply via email to