Re: Backing up SolR 4.0

Shawn Heisey Tue, 04 Dec 2012 09:09:44 -0800

On 12/4/2012 1:55 AM, Andy D'Arcy Jewell wrote:

Is there an easy way to tell (say from a shell script) when "allcommits and merges [are] complete"?

One important bit of information I just thought of: A default Solr 4config uses a new directory implementation called NRTCachingDirectory,which in some circumstances may keep part of the newest segment(s) inRAM. I *hope* that issuing an explicit hard commit will flush that todisk, but I am not sure. You *might* need to switch to the olddirectory implementation to be sure a hardlink backup is complete. Canone of the committers please comment on this? Assuming that we work outthis detail, the rest of what I've said will be valid.

Detecting when commits are done has to be coordinated with your indexingprogram, so depending on how your system works, kicking off the "makehardlinks" process might need to be part of your indexing program. Asfor merges, that's a bit tougher, because Solr 4 and later will domerges in the background after informing your indexing program that thecommit is complete.

If you grab a hardlink copy while a merge is happening, I do not believeit will be corrupt in any way, but it may be larger than expectedbecause it will contain the new segments from the merge. Those segmentswould not be referenced by the segments.nnn file, so I *think* that ifyou then load that index into Solr, it would ignore the other segments.I am not sure about that, though. You might be able to use a commandlike the following to delete the newer segments from the copy, but Iwould not do it without experimentation to be sure it's actuallyrequired, and that it never wipes anything out that you actually need:


find -type f -newer segments.gen | xargs rm -f

If I keep a replica solely for backup purposes, I assume I can "dowhat I like with it" - presumably replication will resume/catch-upwhen I resume it (I admit, I have a bit of reading to do wrtreplication - I just skimmed that because it wasn't in my initial brief).

As long as the replica server isn't being actively updated or used forqueries and you temporarily turn off replication, you should be able todo whatever you want with its index.

I'm assuming that because you're using hardlinks, that means that SolRwrites a "new" file when it updates (sortof copy-on-write style)? Sowe are relying on the principle that as long as you have at least oneremaining reference to the data, it's not deleted...

Yes. Lucene (which Solr uses under the hood) never touches segment filesonce they have been written. It only deletes segment files in twocircumstances: 1) Every document in that segment has been deleted fromthe index. 2) The data in that segment has been written to a newsegment. The combination of Lucene's update method and hardlinkfunctionality will ensure that the hardlink copy is always good.


Thanks,
Shawn

Re: Backing up SolR 4.0

Reply via email to