Re: Loading an index (generated by map reduce) in SolrCloud

shushuai zhu Wed, 17 Sep 2014 19:02:38 -0700

Hi, my case is a little simpler. For example, I have 100 collections now in my 
solr cloud, and I want to backup 20 of them so I can restore them later. I 
think I can just copy the index and log for each shard/core to another 
location, then delete the collections. Later, I can create new collections 
(likely with different names), then copy the index and log back to the right 
directory structure on the node. After that, I can either reload the collection 
or core.

However, some testing shows these do not work. I could not reload the 
collection or core. Have not tried re-starting the solr cloud. Can someone 
point out the best way to achieve the goal? I prefer not to re-start solr 
cloud. 

Shushuai

________________________________
 From: ralph tice <ralph.t...@gmail.com>
To: solr-user@lucene.apache.org 
Sent: Wednesday, September 17, 2014 6:53 PM
Subject: Re: Loading an index (generated by map reduce) in SolrCloud

FWIW, I do a lot of moving Lucene indexes around and as long as the core is
unloaded it's never been an issue for Solr to be running at the same time.

If you move a core into the correct hierarchy for a replica, you can call
the Collections API's CREATESHARD action with the appropriate params (make
sure you use createNodeSet to point to the right server) and Solr will load
the index appropriately.  It's easiest to create a dummy shard and see
where data lands on your installation than to try to guess.

Ex:
PORT=8983
SHARD=myshard
COLLECTION=mycollection
SOLR_HOST=box1.mysolr.corp
curl "http://
${SOLR_HOST}:${PORT}/solr/admin/collections?action=CREATESHARD&shard=${SHARD}&collection=${COLLECTION}&createNodeSet=${SOLR_HOST}:${PORT}_solr"

One file to watch out for if you are moving cores across machines/JVMs is
the core.properties file, which you don't want to duplicate to another
server/location when moving a data directory.  I don't recommend trying to
move transaction logs around either.

On Wed, Sep 17, 2014 at 5:22 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Details please. You say MapReduce. Is this the
> MapReduceIndexerTool? If so, you can use
> the --go-live option to auto-merge them. Your
> Solr instances need to be running over HDFS
> though.
>
> If you don't have Solr running over HDFS, you can
> just copy the results for each shard "to the right place".
> What that means is that you must insure that the
> shards produced via MRIT get copied to the corresponding
> Solr local directory for each shard. If you put the wrong
> one in the wrong place you'll have trouble with multiple
> copies of documents showing up when you re-add any
> doc that already exists in your Solr installation.
>
> BTW, I'd surely stop all my Solr instances while copying
> all this around.
>
> Best,
> Erick
>
> On Wed, Sep 17, 2014 at 1:41 PM, KNitin <nitin.t...@gmail.com> wrote:
> > Hello
> >
> >  I have generated a lucene index (with 6 shards) using Map Reduce. I want
> > to load this into a SolrCloud Cluster inside a collection.
> >
> > Is there any out of the box way of doing this?  Any ideas are much
> > appreciated
> >
> > Thanks
> > Nitin
>

Re: Loading an index (generated by map reduce) in SolrCloud

Reply via email to