I'm familiar with and have used both the DSE cluster as well as am in the process of evaluating cloudera search, in general cloudera search has tight integration with hdfs and takes care of replication and sharding transparently by using the pre-existing hdfs replication and sharding, however cloudera search actually uses solrcloud underneath and you would need to install zookeeper to enable coordination between each of the solr nodes. DataStax allows you to talk to Solr, however their model scales around the data model and architecture of cassandra, release 3.1 allows for some additional solr admin functionality and removes the need to write cassandra specific code.
If you go the open source route you have a few options: 1) You can build a custom plugin inside solr that would internally query hdfs and return data, you would need to figure out how to scale this potentially using a solution very similar to cloudera search (i.e. leverage solrcloud), and if using solrcloud you would need ot install zookeeper for node coordination 2) You could write create a flume channel that accumulates specific events from hdfs and create a sink to write data directly to solr 3) I would look at cloudera search if you need tight integration into hadoop, it might save you some time and efforts I dont think you want to have solr trigger map-reduce jobs if you're looking at having very fast throughput through your search service. Hope this helps, ping me offline if you have more questions. Regards > From: mlie...@impetus.com > To: solr-user@lucene.apache.org > Subject: Re: Solr with Hadoop > Date: Thu, 18 Jul 2013 15:41:36 +0000 > > Rajesh, > > If you require to have an integration between Solr and Hadoop or NoSQL, I > would recommend using a commercial distribution. I think most are free to > use as long as you don't require support. > I inquired about the Cloudera Search capability, but it seems like that > far it is just preliminary: there is no tight integration yet between > Hbase and Solr, for example, other than full text search on the HDFS data > (I believe enabled in Hue). I am not too familiar with what MapR's M7 has > to offer. > However Datastax does a good job of tightly integrating Solr with > Cassandra, and lets you query over the data ingested from Solr in Hive for > example, which is pretty nice. Solr would not trigger Hadoop jobs, though. > > Cheers, > Matt > > > On 7/17/13 7:37 PM, "Rajesh Jain" <rjai...@gmail.com> wrote: > > >I > > have a newbie question on integrating Solr with Hadoop. > > > >There are some vendors like Cloudera/MapR who have announced Solr Search > >for Hadoop. > > > >If I use the Apache distro, how can I use Solr Search on docs in > >HDFS/Hadoop > > > >Is there a tutorial on how to use it or getting started. > > > >I am using Flume to sink CSV docs into Hadoop/HDFS and I would like to use > >Solr to provide Search. > > > >Does Solr Search trigger MapReduce Jobs (like Splunk-Hunk) does? > > > >Thanks, > >Rajesh > > > > > ________________________________ > > > > > > > NOTE: This message may contain information that is confidential, proprietary, > privileged or otherwise protected by law. The message is intended solely for > the named addressee. If received in error, please destroy and notify the > sender. Any use of this email is prohibited when received in error. Impetus > does not represent, warrant and/or guarantee, that the integrity of this > communication has been maintained nor that the communication is free of > errors, virus, interception or interference.