On 8/28/2013 10:48 AM, Daniel Collins wrote:
What ideally I would like to do is at the point that I kick off recovery, divert the indexing feed for the "broken" into a transaction log on those machines, run the replication and swap the index in, then replay the transaction log to bring it all up to date. That process (conceptually) is the same as the org.apache.solr.cloud.RecoveryStrategy code.
I don't think any such mechanism exists currently. It would be extremely awesome if it did. If there's not an existing Jira issue, I recommend that you file one. Being able to set up a multi-datacenter cloud with automatic recovery would be awesome. Even if it took a long time, having it be fully automated would be exceptionally useful.
Yes, if I could divert that feed a that application level, then I can do what you suggest, but it feels like more work to do that (and build an external transaction log) whereas the code seems to already be in Solr itself, I just need to hook it all up (famous last words!) Our indexing pipeline does a lot of pre-processing work (its not just pulling data from a database), and since we are only talking about the time taken to do the replication (should be an hour or less), it feels like we ought to be able to store that in a Solr transaction log (i.e. the last point in the indexing pipeline).
I think it would have to be a separate transaction log. One problem with really big regular tlogs is that when Solr gets restarted, the entire transaction log that's currently on the disk gets replayed. If it were big enough to recover the last several hours to a duplicate cloud, it would take forever to replay on Solr restart. If the regular tlog were kept small but a second log with the last 24 hours were available, it could replay updates when the second cloud came back up.
I do import from a database, so the application-level tracking works really well for me.
Thanks, Shawn