Currently, go-live is only supported when you are running Solr on HDFS.

bq. The indexes must exist on the disk of the Solr host

This does not apply when you are running Solr on HDFS. It’s a shared 
filesystem, so local does not matter here.

"no writes should be allowed on either core until the merge is complete. If 
writes are allowed, corruption may occur on the merged index.”
Doesn’t sound right to me at all.

-- 
Mark Miller
about.me/markrmiller

On April 22, 2014 at 10:38:08 AM, Brett Hoerner (br...@bretthoerner.com) wrote:

I think I'm just misunderstanding the use of go-live. From mergeindexes  
docs: "The indexes must exist on the disk of the Solr host, which may make  
using this in a distributed environment cumbersome."  

I'm guessing I'll have to write some sort of tool that pulls each completed  
index out of HDFS and onto the respective SolrCloud machines and manually  
do some kind of merge? I don't want to (can't) be running my Hadoop jobs on  
the same nodes that SolrCloud is running on...  

Also confusing to me: "no writes should be allowed on either core until the  
merge is complete. If writes are allowed, corruption may occur on the  
merged index." Is that saying that Solr will block writes, or is that  
saying the end user has to ensure no writes are happening against the  
collection during a merge? That seems... risky?  


On Tue, Apr 22, 2014 at 9:29 AM, Brett Hoerner <br...@bretthoerner.com>wrote:  

> Anyone have any thoughts on this?  
>  
> In general, am I expected to be able to go-live from an unrelated cluster  
> of Hadoop machines to a SolrCloud that isn't running off of HDFS?  
>  
> intput: HDFS  
> output: HDFS  
> go-live cluster: SolrCloud cluster on different machines running on plain  
> MMapDirectory  
>  
> I'm back to looking at the code but holy hell is debugging Hadoop hard. :)  
>  
>  
> On Thu, Apr 17, 2014 at 12:33 PM, Brett Hoerner 
> <br...@bretthoerner.com>wrote:  
>  
>> https://gist.github.com/bretthoerner/0dc6bfdbf45a18328d4b  
>>  
>>  
>> On Thu, Apr 17, 2014 at 11:31 AM, Mark Miller <markrmil...@gmail.com>wrote:  
>>  
>>> Odd - might be helpful if you can share your sorlconfig.xml being used.  
>>>  
>>> --  
>>> Mark Miller  
>>> about.me/markrmiller  
>>>  
>>> On April 17, 2014 at 12:18:37 PM, Brett Hoerner (br...@bretthoerner.com)  
>>> wrote:  
>>>  
>>> I'm doing HDFS input and output in my job, with the following:  
>>>  
>>> hadoop jar /mnt/faas-solr.jar \  
>>> -D mapreduce.job.map.class=com.massrel.faassolr.SolrMapper \  
>>> --update-conflict-resolver com.massrel.faassolr.SolrConflictResolver  
>>> \  
>>> --morphline-file /mnt/morphline-ignore.conf \  
>>> --zk-host $ZKHOST \  
>>> --output-dir hdfs://$MASTERIP:9000/output/ \  
>>> --collection $COLLECTION \  
>>> --go-live \  
>>> --verbose \  
>>> hdfs://$MASTERIP:9000/input/  
>>>  
>>> Index creation works,  
>>>  
>>> $ hadoop fs -ls -R hdfs://$MASTERIP:9000/output/results/part-00000  
>>> drwxr-xr-x - hadoop supergroup 0 2014-04-17 16:00 hdfs://  
>>> 10.98.33.114:9000/output/results/part-00000/data  
>>> drwxr-xr-x - hadoop supergroup 0 2014-04-17 16:00 hdfs://  
>>> 10.98.33.114:9000/output/results/part-00000/data/index  
>>> -rwxr-xr-x 1 hadoop supergroup 61 2014-04-17 16:00 hdfs://  
>>> 10.98.33.114:9000/output/results/part-00000/data/index/_0.fdt  
>>> -rwxr-xr-x 1 hadoop supergroup 45 2014-04-17 16:00 hdfs://  
>>> 10.98.33.114:9000/output/results/part-00000/data/index/_0.fdx  
>>> -rwxr-xr-x 1 hadoop supergroup 1681 2014-04-17 16:00 hdfs://  
>>> 10.98.33.114:9000/output/results/part-00000/data/index/_0.fnm  
>>> -rwxr-xr-x 1 hadoop supergroup 396 2014-04-17 16:00 hdfs://  
>>> 10.98.33.114:9000/output/results/part-00000/data/index/_0.si  
>>> -rwxr-xr-x 1 hadoop supergroup 67 2014-04-17 16:00 hdfs://  
>>> 10.98.33.114:9000/output/results/part-00000/data/index/_0_Lucene41_0.doc  
>>> -rwxr-xr-x 1 hadoop supergroup 37 2014-04-17 16:00 hdfs://  
>>> 10.98.33.114:9000/output/results/part-00000/data/index/_0_Lucene41_0.pos  
>>> -rwxr-xr-x 1 hadoop supergroup 508 2014-04-17 16:00 hdfs://  
>>> 10.98.33.114:9000/output/results/part-00000/data/index/_0_Lucene41_0.tim  
>>> -rwxr-xr-x 1 hadoop supergroup 305 2014-04-17 16:00 hdfs://  
>>> 10.98.33.114:9000/output/results/part-00000/data/index/_0_Lucene41_0.tip  
>>> -rwxr-xr-x 1 hadoop supergroup 120 2014-04-17 16:00 hdfs://  
>>> 10.98.33.114:9000/output/results/part-00000/data/index/_0_Lucene45_0.dvd  
>>> -rwxr-xr-x 1 hadoop supergroup 351 2014-04-17 16:00 hdfs://  
>>> 10.98.33.114:9000/output/results/part-00000/data/index/_0_Lucene45_0.dvm  
>>> -rwxr-xr-x 1 hadoop supergroup 45 2014-04-17 16:00 hdfs://  
>>> 10.98.33.114:9000/output/results/part-00000/data/index/segments_1  
>>> -rwxr-xr-x 1 hadoop supergroup 110 2014-04-17 16:00 hdfs://  
>>> 10.98.33.114:9000/output/results/part-00000/data/index/segments_2  
>>> drwxr-xr-x - hadoop supergroup 0 2014-04-17 16:00 hdfs://  
>>> 10.98.33.114:9000/output/results/part-00000/data/tlog  
>>> -rw-r--r-- 1 hadoop supergroup 333 2014-04-17 16:00 hdfs://  
>>>  
>>> 10.98.33.114:9000/output/results/part-00000/data/tlog/tlog.0000000000000000000
>>>   
>>>  
>>> But the go-live step fails, it's trying to use the HDFS path as the  
>>> remote  
>>> index path?  
>>>  
>>> 14/04/17 16:00:31 INFO hadoop.GoLive: Live merging of output shards into  
>>> Solr cluster...  
>>> 14/04/17 16:00:31 INFO hadoop.GoLive: Live merge hdfs://  
>>> 10.98.33.114:9000/output/results/part-00000 into  
>>> http://discover8-test-1d.i.massrel.com:8983/solr  
>>> 14/04/17 16:00:31 ERROR hadoop.GoLive: Error sending live merge command  
>>> java.util.concurrent.ExecutionException:  
>>> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:  
>>> directory '/mnt/solr_8983/home/hdfs:/  
>>> 10.98.33.114:9000/output/results/part-00000/data/index' does not exist  
>>> at java.util.concurrent.FutureTask.report(FutureTask.java:122)  
>>> at java.util.concurrent.FutureTask.get(FutureTask.java:188)  
>>> at org.apache.solr.hadoop.GoLive.goLive(GoLive.java:126)  
>>> at  
>>>  
>>> org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:867)
>>>   
>>> at  
>>>  
>>> org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:609)
>>>   
>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)  
>>> at  
>>>  
>>> org.apache.solr.hadoop.MapReduceIndexerTool.main(MapReduceIndexerTool.java:596)
>>>   
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  
>>> at  
>>>  
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>   
>>> at  
>>>  
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>   
>>> at java.lang.reflect.Method.invoke(Method.java:606)  
>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)  
>>> Caused by:  
>>> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:  
>>> directory '/mnt/solr_8983/home/hdfs:/  
>>> 10.98.33.114:9000/output/results/part-00000/data/index' does not exist  
>>> at  
>>>  
>>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495)
>>>   
>>> at  
>>>  
>>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
>>>   
>>> at  
>>>  
>>> org.apache.solr.client.solrj.request.CoreAdminRequest.process(CoreAdminRequest.java:493)
>>>   
>>> at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:100)  
>>> at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:89)  
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)  
>>> at  
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)  
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)  
>>> at  
>>>  
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>   
>>> at  
>>>  
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>   
>>> at java.lang.Thread.run(Thread.java:744)  
>>> 14/04/17 16:00:31 INFO hadoop.GoLive: Live merging of index shards into  
>>> Solr cluster took 2.31269488E8 secs  
>>> 14/04/17 16:00:31 INFO hadoop.GoLive: Live merging failed  
>>>  
>>> I'm digging into the code now, but wanted to send this out as a sanity  
>>> check.  
>>>  
>>> Thanks,  
>>> Brett  
>>>  
>>  
>>  
>  

Reply via email to