Re: Need of hadoop

Erick Erickson Mon, 07 Jul 2014 22:13:30 -0700

And that's exactly what it means. The HdfsDirectoryFactory is intended
to use the HDFS file system to store the Solr (well, actually Lucene)
index. It's (by default), triply redundant and vastly reduces your
chances of losing your index due to disk errors. That's what HDFS
does.

If that's not useful to you, then you shouldn't use it.

This also is what the MapReduceIndexerTool is written to work with.
You can spread your M/R jobs out over all your cluster, whether or not
you're running Solr on all the nodes that can run M/R jobs. And with
the --go-live option, the final results can be merged into your live
Solr index on whichever nodes are running your Solr instances. Which
are using HdfsDirectoryFactory.

There are anecdotal reports of being able to use the
MapReduceIndexerTool to spread the indexing jobs out, then merge them
with a Solr index on a local file system. However, this is not a
supported use-case by the authors of the MapReduceIndexerTool, and I
suspect (but don't know for sure) that the native file system
implementation doesn't, for instance, make explicit use of the
MMapDirectory wrapper in Lucene. That would be a nice case to support,
but it isn't high on the contributors' priority list.

So the "bottom line" is that if the file redundancy (and associated
goodness of being able to access the data from anywhere) isn't
valuable to you, there's no particular reason to use the Solr index
over HDFS.

Best,
Erick

On Mon, Jul 7, 2014 at 9:49 PM, search engn dev
<sachinyadav0...@gmail.com> wrote:
> It is written  here
> <https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Need-of-hadoop-tp4145846p4146033.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Need of hadoop

Reply via email to