And that's exactly what it means. The HdfsDirectoryFactory is intended to use the HDFS file system to store the Solr (well, actually Lucene) index. It's (by default), triply redundant and vastly reduces your chances of losing your index due to disk errors. That's what HDFS does.
If that's not useful to you, then you shouldn't use it. This also is what the MapReduceIndexerTool is written to work with. You can spread your M/R jobs out over all your cluster, whether or not you're running Solr on all the nodes that can run M/R jobs. And with the --go-live option, the final results can be merged into your live Solr index on whichever nodes are running your Solr instances. Which are using HdfsDirectoryFactory. There are anecdotal reports of being able to use the MapReduceIndexerTool to spread the indexing jobs out, then merge them with a Solr index on a local file system. However, this is not a supported use-case by the authors of the MapReduceIndexerTool, and I suspect (but don't know for sure) that the native file system implementation doesn't, for instance, make explicit use of the MMapDirectory wrapper in Lucene. That would be a nice case to support, but it isn't high on the contributors' priority list. So the "bottom line" is that if the file redundancy (and associated goodness of being able to access the data from anywhere) isn't valuable to you, there's no particular reason to use the Solr index over HDFS. Best, Erick On Mon, Jul 7, 2014 at 9:49 PM, search engn dev <sachinyadav0...@gmail.com> wrote: > It is written here > <https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS> > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Need-of-hadoop-tp4145846p4146033.html > Sent from the Solr - User mailing list archive at Nabble.com.