Cool Mark, I'll keep an eye on this one.

L


On 22/01/2014 22:36, Mark Miller wrote:
Whoops, hit the send keyboard shortcut.

I just created a JIRA issue for the first bit I’ll be working on:

SOLR-5656: When using HDFS, the Overseer should have the ability to reassign 
the cores from failed nodes to running nodes.

- Mark



On Jan 22, 2014, 12:57:46 PM, Lajos <la...@protulae.com> wrote: Thanks Mark ... 
indeed, some doc updates would help.

Regarding what seems to be a popular question on sharding. It seems that
it would be a Good Thing that the shards for a collection running HDFS
essentially be pointers to the HDFS-replicated index. Is that what your
thinking is?

I've been following your work recently, would be interested in helping
out on this if there's the chance.

Is there a JIRA yet on this issue?

Thanks,

lajos


On 22/01/2014 16:57, Mark Miller wrote:
Right - solr.hdfs.home is the only setting you should use with SolrCloud.

The documentation should probably be improved.

If you set the data dir or ulog location in solrconfig.xml explicitly, it will 
be the same for every collection. SolrCloud shares the solrconfig.xml across 
SolrCore’s, and this will not work out.

By setting solr.hdfs.home and leaving the relative defaults, all of the 
locations are correctly set for each different collection under solr.hdfs.home 
without any effort on your part.

- Mark



On Jan 22, 2014, 7:22:22 AM, Lajos <la...@protulae.com> wrote: Uugh. I just 
realised I should have take out the data dir and update log
definitions! Now it works fine.

Cheers,

L


On 22/01/2014 11:47, Lajos wrote:
Hi all,

I've been running Solr on HDFS, and that's fine.

But I have a Cloud installation I thought I'd try on HDFS. I uploaded
the configs for the core that runs in standalone mode already on HDFS
(on another cluster). I specify the HdfsDirectoryFactory, HDFS data dir,
solr.hdfs.home, and HDFS update log path:

<dataDir>hdfs://master:9000/solr/test/data</dataDir>

<directoryFactory name="DirectoryFactory"
class="solr.HdfsDirectoryFactory">
<str name="solr.hdfs.home">hdfs://master:9000/solr</str>
</directoryFactory>

<updateHandler class="solr.DirectUpdateHandler2">
<updateLog>
<str name="dir">hdfs://master:9000/solr/test/ulog</str>
</updateLog>
</updateHandler>

Question is: should I create my collection differently than I would a
normal collection?

If I just try that, Solr will initialise the directory in HDFS as if it
were a single core. It will create shard directories on my nodes, but
not actually put anything in there. And then it will complain mightily
about not being able to forward updates to other nodes. (This same
cluster hosts regular collections, and everything is working fine).

Am I missing a step? Do I have to manually create HDFS directories for
each replica?

Thanks,

L


Reply via email to