I'm familiar with and have used both the DSE cluster as well as am in the 
process of evaluating cloudera search, in general cloudera search has tight 
integration with hdfs and takes care of replication and sharding transparently 
by using the pre-existing hdfs replication and sharding, however cloudera 
search actually uses solrcloud underneath and you would need to install 
zookeeper to enable coordination between each of the solr nodes.   DataStax 
allows you to talk to Solr, however their model scales around the data model 
and architecture of cassandra, release 3.1 allows for some additional solr 
admin functionality and removes the need to write cassandra specific code.

If you go the open source route you have a few options:

1) You can build a custom plugin inside solr that would internally query hdfs 
and return data, you would need to figure out how to scale this potentially 
using a solution very similar to cloudera search (i.e. leverage solrcloud), and 
if using solrcloud you would need ot install zookeeper for node coordination

2) You could write create a flume channel that accumulates specific events from 
hdfs and create a sink to write data directly to solr

3) I would look at cloudera search if you need tight integration into hadoop, 
it might save you some time and efforts

I dont think you want to have solr trigger map-reduce jobs if you're looking at 
having very fast throughput through your search service.


Hope this helps, ping me offline if you have more questions.
Regards

> From: mlie...@impetus.com
> To: solr-user@lucene.apache.org
> Subject: Re: Solr with Hadoop
> Date: Thu, 18 Jul 2013 15:41:36 +0000
> 
> Rajesh,
> 
> If you require to have an integration between Solr and Hadoop or NoSQL, I
> would recommend using a commercial distribution. I think most are free to
> use as long as you don't require support.
> I inquired about the Cloudera Search capability, but it seems like that
> far it is just preliminary: there is no tight integration yet between
> Hbase and Solr, for example, other than full text search on the HDFS data
> (I believe enabled in Hue). I am not too familiar with what MapR's M7 has
> to offer.
> However Datastax does a good job of tightly integrating Solr with
> Cassandra, and lets you query over the data ingested from Solr in Hive for
> example, which is pretty nice. Solr would not trigger Hadoop jobs, though.
> 
> Cheers,
> Matt
> 
> 
> On 7/17/13 7:37 PM, "Rajesh Jain" <rjai...@gmail.com> wrote:
> 
> >I
> > have a newbie question on integrating Solr with Hadoop.
> >
> >There are some vendors like Cloudera/MapR who have announced Solr Search
> >for Hadoop.
> >
> >If I use the Apache distro, how can I use Solr Search on docs in
> >HDFS/Hadoop
> >
> >Is there a tutorial on how to use it or getting started.
> >
> >I am using Flume to sink CSV docs into Hadoop/HDFS and I would like to use
> >Solr to provide Search.
> >
> >Does Solr Search trigger MapReduce Jobs (like Splunk-Hunk) does?
> >
> >Thanks,
> >Rajesh
> >
> 
> 
> ________________________________
> 
> 
> 
> 
> 
> 
> NOTE: This message may contain information that is confidential, proprietary, 
> privileged or otherwise protected by law. The message is intended solely for 
> the named addressee. If received in error, please destroy and notify the 
> sender. Any use of this email is prohibited when received in error. Impetus 
> does not represent, warrant and/or guarantee, that the integrity of this 
> communication has been maintained nor that the communication is free of 
> errors, virus, interception or interference.
                                          

Reply via email to