Re: Copying a SolrCloud collection to other hosts

2018-03-28 Thread Jeff Wartes
I really like the fetchindex approach. Once I figured out the undocumented API, it worked really well, and I haven't had to change my usage for any Solr I've tried between 4.7-7.2. I recall having some issues if I tried to apply a fetchindex to a shard that already had data, where it'd get conf

Re: Copying a SolrCloud collection to other hosts

2018-03-28 Thread Jeff Wartes
It'd work fine, I played around with this a bit once upon a time. The trick is that you need to either: 1. Make sure all shards for the index are synchronized onto every node for the duration of the restore (as you mention) 2. Know exactly which nodes will ask to restore which shards for the du

Re: Copying a SolrCloud collection to other hosts

2018-03-28 Thread Shawn Heisey
On 3/28/2018 10:34 AM, Jeff Wartes wrote: > The backup/restore still requires setting up a shared filesystem on all your > nodes though right? Technically speaking, I don't think a shared filesystem is actually REQUIRED to make a backup. But in order to do a restore, all machines involved with

Re: Copying a SolrCloud collection to other hosts

2018-03-28 Thread Erick Erickson
Hmmm, wouldn't even be all that hard would it? A collections API call. Assuming both collection's state.json nodes were available from ZooKeeper a command would have all the necessary information, only an HTTP connection required. I don't think it would be too much of a stretch to be able to provi

Re: Copying a SolrCloud collection to other hosts

2018-03-28 Thread David Smiley
Right, there is a shared filesystem requirement. It would be nice if this Solr feature could be enhanced to have more options like backing up directly to another SolrCloud using replication/fetchIndex like your cool solrcloud_manager thing. On Wed, Mar 28, 2018 at 12:34 PM Jeff Wartes wrote: >

Re: Copying a SolrCloud collection to other hosts

2018-03-28 Thread Jeff Wartes
The backup/restore still requires setting up a shared filesystem on all your nodes though right? I've been using the fetchindex trick in my solrcloud_manager tool for ages now: https://github.com/whitepages/solrcloud_manager#cluster-commands Some of the original features in that tool have been

Re: Copying a SolrCloud collection to other hosts

2018-03-27 Thread David Smiley
The backup/restore API is intended to address this. https://builds.apache.org/job/Solr-reference-guide-master/javadoc/making-and-restoring-backups.html Erick's advice is good (and I once drafted docs for the same scheme years ago as well), but I consider it dated -- it's what people had to do befo

Re: Copying a SolrCloud collection to other hosts

2018-03-15 Thread Erick Erickson
yeah, it's on a core-by-core basis. Which also makes getting it propagated to all replicas something you have to be sure happens... Glad it's working for you! Erick On Thu, Mar 15, 2018 at 1:54 AM, Patrick Schemitz wrote: > Hi Erick, > > thanks a lot, that solved our problem nicely. > > (It took

Re: Copying a SolrCloud collection to other hosts

2018-03-15 Thread Patrick Schemitz
Hi Erick, thanks a lot, that solved our problem nicely. (It took us a try or two to notice that this will not copy the entire collection but only the shard on the source instance, and we need to do this for all instances explicitly. But hey, we had to do the same for the old approch of scp'ing th

Re: Copying a SolrCloud collection to other hosts

2018-03-06 Thread Erick Erickson
this is part of the "different replica types" capability, there are NRT (the only type available prior to 7x), PULL and TLOG which would have different names. I don't know of any way to switch it off. As far as moving the data, here's a little known trick: Use the replication API to issue a fetchi

Copying a SolrCloud collection to other hosts

2018-03-06 Thread Patrick Schemitz
Hi List, so I'm running a bunch of SolrCloud clusters (each cluster is: 8 shards on 2 servers, with 4 instances per server, no replicas, i.e. 1 shard per instance). Building the index afresh takes 15+ hours, so when I have to deploy a new index, I build it once, on one cluster, and then copy (scp