If I run rdiff-backup with a complicated ssh connection string that does some housekeeping stuff after, and it's running on a slow remote box, rdiff-backup will exit while the ssh ps is still running for .5 to 1.5 seconds. If I check with ps command at the right time I can see that the ssh command has been taken over by ps 1 as its ppid.
CloseConnections() I guess isn't wait()ing on the children. This is a problem for me as I like to capture the output from my cleanup stuff in my script that is handling all of this. That output is lost to nowheresville if it isn't spit out in a short enough time (before pid 1 takes over ssh). Took me forever to figure out why some runs I got this output and some I didn't. Depends on how fast and loaded the remote computer is. I managed to fix it by hacking in 5 lines of code in connection.py and SetConnections.py to remember the process handle along with the pipe handles, and in connection._close doing a wait() on the handle. Now it works perfectly, rdiff properly waits for its children before exiting, not stranding any ps (or output!) with pid 1. It would be great to get this tweak into the official version! I can't see any downside. The Python docs on Popen says wait could deadlock, but since the rdiff-backup protocol is so precise, I doubt there could ever be remaining data on the pipe at that point. Otherwise they say to use communicate() but that seems more complicated than what I'm after. (And Python is not my language of choice nor maximum aptitude.) _______________________________________________ rdiff-backup-users mailing list at [email protected] https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
