Hey Folks, I run a few slackware boxes and recently (read: about a month ago) upgraded the mysql server and client libraries on my installs. Sometime after that point, the full backups I was performing on these three boxes started to take 5-6 hours a piece, as opposed to the 30-40 minutes it was previously taking. At first I thought it might be a file IO issue, but I've since traced it back to some interaction with MySQL. I've pasted the snippets of info I believe are pertinent to the problem, but am willing to provide more information as needed.
OS: slackware linux - 3.2.13-smp CPU: Core i3 @ 3.07 Memory: 4GB MySQL: /usr/libexec/mysqld Ver 5.5.24 for Linux on i686 (MySQL Community Server (GPL)) Bacula: watchfox-dir Version: 5.2.6 (21 February 2012) i686-pc-linux-gnu slackware Slackware 13.37.0 All three machines are running the same install of the software, but rebuilt on each machine. MySQL was installed via the package manager from binaries. Bacula was configured on the director with the following options: --prefix=/usr --sysconfdir=/etc/bacula --with-openssl --with-mysql --enable-batch-insert The other two machines (clients) are built with the following options: --prefix=/usr --sysconfdir=/etc/bacula --with-openssl --with-mysql --enable-client-only The compressed backup of the three machines is around 4GB, give or take, covering about 400k files. All in all, a relatively small backup. I've enabled compression as GZIP and signature as MD5. Both of these options were enabled prior to the issues happening. The director connects to a MySQL database on one of the other machines for its need. The connection is a 100Mb connection, which has always been sufficient. The file daemon writes to a /mnt/backup location, which is an NFS mount to an array of large drives on the network. Originally, I did NOT have 'Spool Attributes' set to 'yes', but enabled it to separate out the SQL from the file IO. The file backup completes in about 21 minutes for one of the machines. But the 'despooling' of about 125MB worth of data takes 3 and a half hours. I checked the 'mysqladmin' command to get an idea of the query rate during this despooling time and find that it fluctuates around 20 qps. If I take the resulting bacula database, dump it using mysqldump, drop the tables, and insert the same set of data back into the database from the same remote machine using the mysql 'source' command, it takes less than 10 seconds. With 'Spool Attributes' set to 'No', the entire backup just appears to take 5 hours as the SQL is done in-line with the backup. I also did not originally have '--enable-batch-insert' in the configure options, as it supposedly defaulted to 'yes'. I recompiled to enable it, just for testing. I also did an strace of the bacula file daemon and the mysql daemon to see if there was any system IO that may have been holding it up, but after 10 minutes, bacula spent a total of 1 second in system calls, and mysql 40 seconds. Alas, I did not get an strace of the director daemon. Neither machine is showing any amount of heavy load during this time, so I'm not sure where to begin looking. Below is the output from a bconsole session where I ran the backup of a mail server manually. As you can see, the actual file writing takes place fairly quickly, but the despooling takes forever. I would suspect mysql as the culprit here, but, as I stated previously, I can restore a mysldump of the database over the same connection much more quickly than bacula does. I am willing to provide further information to you folks as needed, or recompile bacula and test. Any suggestions on where to look would be welcome. Or, if a bug report would be a more appropriate avenue, I can file that as well. Thanks, - Greg ----- bconsole output ----- Run Backup job JobName: Vixen Backup Job Level: Incremental Client: vixen-fd FileSet: Vixen Backup Set Pool: Vixen Incremental Pool (From Job resource) Storage: File (From Job resource) When: 2012-05-29 12:06:43 Priority: 10 OK to run? (yes/mod/no): yes Job queued. JobId=1 You have messages. *messages 29-May 12:06 watchfox-dir JobId 1: No prior Full backup Job record found. 29-May 12:06 watchfox-dir JobId 1: No prior or suitable Full backup found in catalog. Doing FULL backup. 29-May 12:06 watchfox-dir JobId 1: Start Backup JobId 1, Job=Vixen_Backup_Job.2012-05-29_12.06.46_03 29-May 12:06 watchfox-dir JobId 1: Created new Volume "Vixen_Full_0" in catalog. 29-May 12:06 watchfox-dir JobId 1: Using Device "FileStorage" 29-May 12:06 watchfox-sd JobId 1: Labeled new Volume "Vixen_Full_0" on device "FileStorage" (/mnt/backup). 29-May 12:06 watchfox-sd JobId 1: Wrote label to prelabeled Volume "Vixen_Full_0" on device "FileStorage" (/mnt/backup) 29-May 12:06 watchfox-dir JobId 1: Max Volume jobs=1 exceeded. Marking Volume "Vixen_Full_0" as Used. 29-May 12:28 watchfox-sd JobId 1: Job write elapsed time = 00:21:32, Transfer rate = 3.127 M Bytes/second 29-May 12:28 watchfox-sd JobId 1: Sending spooled attrs to the Director. Despooling 125,083,020 bytes ... 29-May 17:05 watchfox-dir JobId 1: Bacula watchfox-dir 5.2.6 (21Feb12): Build OS: i686-pc-linux-gnu slackware Slackware 13.37.0 JobId: 1 Job: Vixen_Backup_Job.2012-05-29_12.06.46_03 Backup Level: Full (upgraded from Incremental) Client: "vixen-fd" 5.2.6 (21Feb12) i686-pc-linux-gnu,slackware,Slackware 13.37.0 FileSet: "Vixen Backup Set" 2012-05-29 12:06:46 Pool: "Vixen Full Pool" (From Job FullPool override) Catalog: "Catalog" (From Client resource) Storage: "File" (From Job resource) Scheduled time: 29-May-2012 12:06:43 Start time: 29-May-2012 12:06:49 End time: 29-May-2012 17:05:17 Elapsed time: 4 hours 58 mins 28 secs Priority: 10 FD Files Written: 404,494 SD Files Written: 404,494 FD Bytes Written: 3,984,337,612 (3.984 GB) SD Bytes Written: 4,040,084,452 (4.040 GB) Rate: 222.5 KB/s Software Compression: 54.6 % VSS: no Encryption: no Accurate: yes Volume name(s): Vixen_Full_0 Volume Session Id: 1 Volume Session Time: 1338307461 Last Volume Bytes: 4,056,969,627 (4.056 GB) Non-fatal FD errors: 0 SD Errors: 0 FD termination status: OK SD termination status: OK Termination: Backup OK 29-May 17:05 watchfox-dir JobId 1: Begin pruning Jobs older than 6 months . 29-May 17:05 watchfox-dir JobId 1: No Jobs found to prune. 29-May 17:05 watchfox-dir JobId 1: Begin pruning Files. 29-May 17:05 watchfox-dir JobId 1: No Files found to prune. 29-May 17:05 watchfox-dir JobId 1: End auto prune. ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Bacula-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/bacula-users
