Am I correct in assuming that the watchdog timer is killing my jobs?
I have to constrain my backup jobs to less that 15 MBps with traffic
shaping in this environment to avoid bandwidth contention, so even small
jobs take a long time for bacula-fd to copy over.
Assuming I am correct that the watchdog timer is to blame, what's the
work around? The traffic shaping is a hard client requirement.
23-Dec 14:02 bacula-dir JobId 60: Using Device "Drive0" to write.
23-Dec 14:02 Scalar-i40 JobId 60: Spooling data ...
29-Dec 14:02 bacula-dir JobId 60: Error: Watchdog sending kill after
518417 secs to thread stalled reading Storage daemon.
29-Dec 14:02 bacula-dir JobId 60: Error: Watchdog sending kill after
518417 secs to thread stalled reading File daemon.
29-Dec 14:02 bacula-dir JobId 60: Fatal error: Network error with FD
during Backup: ERR=Interrupted system call
29-Dec 14:02 bacula-dir JobId 60: Error: Director's comm line to SD dropped.
29-Dec 14:02 bacula-dir JobId 60: Fatal error: No Job status returned
from FD.
29-Dec 14:02 bacula-dir JobId 60: Error: Bacula bacula-dir 5.2.13 (19Jan13):
Build OS: x86_64-redhat-linux-gnu redhat (Core)
JobId: 60
Job: GCC_archive-everyone-else.2019-12-23_14.02.07_03
Backup Level: Full
Client: "bock" 5.2.13 (19Jan13)
x86_64-redhat-linux-gnu,redhat,(Core)
FileSet: "GCC_archive-everyone-else-fileset"
2019-12-23 13:59:50
Pool: "lto6-pool" (From Job resource)
Catalog: "GACCatalog" (From Client resource)
Storage: "Scalar-i40" (From Pool resource)
Scheduled time: 23-Dec-2019 14:02:05
Start time: 23-Dec-2019 14:02:09
End time: 29-Dec-2019 14:02:26
Elapsed time: 6 days 17 secs
Priority: 12
FD Files Written: 0
SD Files Written: 0
FD Bytes Written: 0 (0 B)
SD Bytes Written: 0 (0 B)
Rate: 0.0 KB/s
Software Compression: None
VSS: no
Encryption: no
Accurate: no
Volume name(s):
Volume Session Id: 6
Volume Session Time: 1577127434
Last Volume Bytes: 387,072 (387.0 KB)
Non-fatal FD errors: 3
SD Errors: 0
FD termination status: Error
SD termination status: Error
Termination: *** Backup Error ***
[root@bock Job]# cat GCC_archive-everyone-else-job.conf
#JN job file for Bock
#----------------------------------
Job {
Name = GCC_archive-everyone-else
Type = Backup
Client = bock
Schedule = ManualOnly
Messages = Daemon
FileSet = GCC_archive-everyone-else-fileset
Level = Full
Pool = lto6-pool
Priority = 12
Max Run Time = 8035200 # default limit is 6 days, 518400sec. bumped
3x just in case
Spool Data = yes
Spool Attributes = yes # spools catalog entries to disk until after
file is backed up. If a job fails, catalogue remains clean
##JN backup Bacula DB when job is done.
RunScript {
RunsWhen = After
FailJobOnError = No
Command = "/usr/local/sbin/backup-bacula-db.sh"
}
}
#----------------------------------
https://www.bacula.org/5.2.x-manuals/en/main/main/Configuring_Director.html
Max Run Time = time
The time specifies the maximum allowed time that a job may run,
counted from when the job starts, (not necessarily the same as when the
job was scheduled).
By default, the the watchdog thread will kill any Job that has run
more than 6 days. The maximum watchdog timeout is independent of
MaxRunTime and cannot be changed.
--
Thanks,
John H. Nyhuis
Desk: (206)-685-8334
[email protected]
Box 359461, 15th floor, 106
_______________________________________________
Bacula-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-users