Hi, In the GoLive stage, the MRIT sends the MERGEINDEXES requests to Solr instances. The request has a indexDir parameter with a hdfs path to the index generated on HDFS, as shown in the MRIT log:
2014-07-02 15:03:55,123 DEBUG org.apache.http.impl.conn.DefaultClientConnection: Sending request: GET /solr/admin/cores?action=MERGEINDEXES&core=collection1&indexDir=hdfs%3A%2F% 2Fhdtest041.test.com%3A9000%2Foutdir_webaccess_app%2Fresults%2Fpart-00000%2Fdata%2Findex&wt=javabin&version=2 HTTP/1.1 So it's up to the Solr instance to understand reading index from HDFS (rather than for the MRIT to find the local disk to write from HDFS). The go-live option is very convenient to merge generated index to live index. It's desirable to use go-live than copy around indexes to local file system and then merge. I tried to start Solr instance with these properties to allow solr instance to write to local file system while being able to read index on HDFS when doing MERGEINDEXES: -Dsolr.directoryFactory=HdfsDirectoryFactory \ -Dsolr.hdfs.confdir=$HADOOP_HOME/hadoop-conf \ -Dsolr.lock.type=hdfs \ -Dsolr.hdfs.home=file:///opt/test/solr/node/solr \ i.e. the full command: java -DnumShards=2 \ -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf \ -DzkHost=<zookeeper>:2181 \ -Dhost=<node1> \ -DSTOP.PORT=7983 -DSTOP.KEY=key \ -Dsolr.directoryFactory=HdfsDirectoryFactory \ -Dsolr.hdfs.confdir=$HADOOP_HOME/hadoop-conf \ -Dsolr.lock.type=hdfs \ -Dsolr.hdfs.home=file:///opt/test/solr/node/solr \ -jar start.jar With that, the go-live works fine. Any comment on this approach? Tom On Wed, Jul 2, 2014 at 9:50 PM, Erick Erickson <erickerick...@gmail.com> wrote: > How would the MapReduceIndexerTool (MRIT for short) > find the local disk to write from HDFS to for each shard? > All it has is the information in the Solr configs, which are > usually relative paths on the local Solr machines, relative > to SOLR_HOME. Which could be different on each node > (that would be screwy, but possible). > > Permissions would also be a royal pain to get right.... > > You _can_ forego the --go-live option and copy from > the HDFS nodes to your local drive and then execute > the "mergeIndexes" command, see: > https://cwiki.apache.org/confluence/display/solr/Merging+Indexes > Note that there is the MergeIndexTool, but there's also > the Core Admin command. > > The sub-indexes are in a partition in HDFS and numbered > sequentially. > > Best, > Erick > > On Wed, Jul 2, 2014 at 3:23 PM, Tom Chen <tomchen1...@gmail.com> wrote: > > Hi, > > > > > > When we run Solr Map Reduce Indexer Tool ( > > https://github.com/markrmiller/solr-map-reduce-example), it generates > > indexes on HDFS > > > > The last stage is Go Live to merge the generated index to live SolrCloud > > index. > > > > If the live SolrCloud write index to local file system (rather than > HDFS), > > the Go Live gives such error like this: > > > > 2014-07-02 13:41:01,518 INFO org.apache.solr.hadoop.GoLive: Live merge > > hdfs:// > > > bdvs086.test.com:9000/tmp/0000088-140618120223665-oozie-oozi-W/results/part-00000 > > into http://bdvs087.test.com:8983/solr > > 2014-07-02 13:41:01,796 ERROR org.apache.solr.hadoop.GoLive: Error > sending > > live merge command > > java.util.concurrent.ExecutionException: > > org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: > > directory '/opt/testdir/solr/node/hdfs:/ > > > bdvs086.test.com:9000/tmp/0000088-140618120223665-oozie-oozi-W/results/part-00001/data/index > ' > > does not exist > > at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:233) > > at java.util.concurrent.FutureTask.get(FutureTask.java:94) > > at org.apache.solr.hadoop.GoLive.goLive(GoLive.java:126) > > at > > > org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:867) > > at > > > org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:609) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > > at > > > org.apache.solr.hadoop.MapReduceIndexerTool.main(MapReduceIndexerTool.java:596) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) > > at > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) > > at java.lang.reflect.Method.invoke(Method.java:611) > > at > > > org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:491) > > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:434) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > > at java.security.AccessController.doPrivileged(AccessController.java:310) > > at javax.security.auth.Subject.doAs(Subject.java:573) > > at > > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502) > > at org.apache.hadoop.mapred.Child.main(Child.java:249) > > Caused by: > > org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: > > directory '/opt/testdir/solr/node/hdfs:/ > > > bdvs086.test.com:9000/tmp/0000088-140618120223665-oozie-oozi-W/results/part-00001/data/index > ' > > does not exist > > at > > > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495) > > at > > > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199) > > at > > > org.apache.solr.client.solrj.request.CoreAdminRequest.process(CoreAdminRequest.java:493) > > at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:100) > > at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:89) > > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314) > > at java.util.concurrent.FutureTask.run(FutureTask.java:149) > > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:452) > > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314) > > at java.util.concurrent.FutureTask.run(FutureTask.java:149) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919) > > at java.lang.Thread.run(Thread.java:738) > > > > Any way to setup SolrCloud to write index to local file system, while > > allowing the Solr MapReduceIndexerTool's GoLive to merge index generated > on > > HDFS to the SolrCloud? > > > > Thanks, > > Tom >