>>> Thanks for the reply Steven. It's no more than 100 lines at a >>> guess > >> In that case just copy and paste it into a message and send it to >> the group. Anyone with time available can then take a peek. > > One way noobs anywhere can learn is by listening in to other people's > conversations - it's called lurking, I believe. > > So I would say, please do this on the list, and many more people than > Adam may benefit. Others can ignore the thread if they wish. > > Bob
Oke doke, here it is below. Just for convenience's sake, I'm going to repeat what the basic steps are. It's a backup script for certain xen virtual machines ("VM") running on my server. Each VM runs on its own logical volume (as opposed to a file-based loop device). From my own (bitter) experience, the absolutely best way to back up a VM running on a logical volume is to clone it to an image file using dd. I'm aware that a separate discussion could be had around this (on a different mailing list) but, unless someone thinks this is a horribly flawed approach, it may be best to assume this approach is 'fine' so as not to distract from the code review!! Here are the steps: 1) create snapshots of the xen logical volumes using the built in snapshot feature of LVM2 (this way I can backup each logical volume without having to shut down the VM) 2) dd and bzip2 (using a pipe) the snapshots to .img.bz2 files for storage on the same server 3) gpg encrypt the same files and upload them to Amazon s3 4) remove the logical volume snapshots (because they accumulate disk space and I'm doing this daily) and the .gpg files 5) deletes files in the s3 directory which are older than X days As I've mentioned, I'm a real noob, so I'm still mastering some basic stuff. The script works fine for my purposes, I'm keen to understand where it could be improved from a python pov. Finally, yes I could have written this in bash but I prefer python! P.S. I think some of the comments have been wrapped onto more than one line by my email client, I hope this doesn't cause too much inconvenience. ==================================== #!/usr/bin/python3 ############################################ ## XEN VIRTUAL MACHINE BACKUP SCRIPT ## ## Copyright (C) 2014 Adam Gold ## ## This program is free software: you can redistribute it and/or modify ## it under the terms of the GNU General Public License as published by ## the Free Software Foundation, either version 3 of the License, or (at ## your option) any later version. ## ## This program is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See ## the GNU General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with this program. If not see <http://gnu.org/licenses/> ## ## Version: 0.4 ## 2014-06-10 ############################################ import datetime, time, subprocess, shlex, os, gnupg, glob, shutil # logical volumes exist in two different volume groups, vgxen and vg_data # hence two lists of vms vgxenList = ['vm1', 'vm2', 'vm3', 'vm4', 'vm5', 'vm6' ] vg_dataList = ['vm1', 'vm2'] backupList = [ ] snapNameList = [ ] # create snapshot names like the following: 2014-06-10T01-00-01.vm1.img.bz2 for i in vgxenList: DATE = datetime.datetime.now().strftime("%Y-%m-%d" + "T" + "%H-%M-%S") vgxenName = "/dev/vgxen/" lvName = i origName = vgxenName + lvName snapName= DATE + "." + lvName snapNameList.append(snapName) backupList.append(vgxenName + snapName) subprocess.call(['lvcreate', '-s', '-L1G', origName, '-n', snapName]) for h in vg_dataList: DATE = datetime.datetime.now().strftime("%Y-%m-%d" + "T" + "%H-%M-%S") vg_dataName = "/dev/vg_data/" lvName = h origName = vg_dataName + lvName snapName = DATE + "." + lvName snapNameList.append(snapName) backupList.append(vg_dataName + snapName) subprocess.call(['lvcreate', '-s', '-L1G', origName, '-n', snapName]) # backupPath is list of full paths of each snapshot # the string is extacted from backupList using 'join' backupPath = ' '.join(backupList) for j, k in zip(backupList, snapNameList): backupPath = j backupSnapshot = k # run dd and pipe to bz2 file using subprocess module ddIf = shlex.split("dd if=%s bs=4k conv=noerror,notrunc,sync" % (backupPath)) compress = "pbzip2" filename = "/home/files/temp/%s.img.bz2" % (backupSnapshot) p1 = subprocess.Popen(ddIf, stdout=subprocess.PIPE) with p1.stdout as fin, open(filename, "w") as fout: p2 = subprocess.Popen(compress, stdin=fin, stdout=fout) ret1 = p1.wait() ret2 = p2.wait() # create list of files to be encrypted with full path names # start with list of unencrypted files cryptDir = '/home/files/temp/' unencrypted = [u for u in os.listdir(cryptDir)] # join absolute path to file names to create new list (list comprehension) cryptDir_unencrypted = [ os.path.join(cryptDir, s) for s in unencrypted ] # encrypt files for G in cryptDir_unencrypted: gpg = gnupg.GPG(gnupghome='/root/.gnupg') phrase = "passphrase" # HORRIBLE SECURITY, I KNOW! The script is running as a cronjob so I can't interactively enter the passphrase. Suggestions are welcome. cipher = "AES256" with open(G, 'rb') as f: status = gpg.encrypt_file(f, None, armor=False, passphrase=phrase, symmetric=cipher.upper(), output=G + '.gpg') # move unencypted files out of temp directory for data in glob.glob(cryptDir + '*.bz2'): shutil.move(data,'/home/files/') # delete snapshots for r in snapNameList: removeSnapshots1 = 'lvremove -f ' + vgxenName + r subprocess.call(shlex.split(removeSnapshots1)) removeSnapshots2 = 'lvremove -f ' + vg_dataName + r subprocess.call(shlex.split(removeSnapshots2)) # create list of file names to be uploaded (list comprehension) uploads = [y for y in os.listdir(cryptDir)] # join absolute path to file names to create new list (list comprehension) cryptDir_uploads = [ os.path.join(cryptDir, t) for t in uploads ] # upload to Amazon s3 for d in cryptDir_uploads: s3Upload = 's3cmd put ' + d + ' s3://bucket/dir/' subprocess.call(shlex.split(s3Upload)) subprocess.call(shlex.split('rm ' + d)) # move working path to list of unencrypted vm backup files path = '/home/files/' os.chdir(path) # build list of unencrypted vm backup files (list comprehension) fileNames = [ u for u in os.listdir(path) if os.path.isfile(u) ] # build list of unencrypted vm backup files with .gpg # this will mirror the list of files in s3 (list comprehension) fileNames_gpg = [ p + '.gpg' for p in fileNames ] # NOTE: I tried to collapse the previous two list comprehensions into one # as it would seem to be possible - the contents of fileNames_gpg should be # an os.listdir() of the directory they are in. For some reason I couldn't # do this so I did the above hack by appending the .gpg extension to a different list # calculate age of unencrypted vm backup files and # hence encrypted files on s3 (list comprehension) # NOTE: I have to use the unencrypted files on the server # as the gpg files get deleted after each upload to s3 ageList = [ round((time.time() - os.stat(n).st_mtime)/60/60/24) for n in fileNames ] # delete files older than 'age' variable; age of file in ageList compared # against name in fileNames_gpg age = 7 for x, y in zip(fileNames_gpg, ageList): if y > age: subprocess.call(shlex.split('s3cmd del s3://bucket/dir/' + x)) _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor