Looks like the usual Firewall/Router dropping the connection after two
hours, yes?
06-Jan 02:02 moregruel-dir: Start Backup JobId 4501,
Job=Web-fury.2008-01-06_01.05.03
06-Jan 02:02 moregruel-dir: Recycled volume "Vol0033"
06-Jan 02:02 moregruel-sd: Recycled volume "Vol0033" on device "FileStorage"
(/bacula), all previous data lost.
06-Jan 03:13 moregruel-dir: Volume used once. Marking Volume "Vol0033" as Used.
06-Jan 04:02 moregruel-dir: Web-fury.2008-01-06_01.05.03 Fatal error: Network
error with FD during Backup: ERR=Connection reset by peer
06-Jan 04:02 moregruel-dir: Web-fury.2008-01-06_01.05.03 Fatal error: No Job
status returned from FD.
But: HeartbeatInterval is set to 10 minutes in both the relevant FD and the SD.
And, it looks like all the data was sent to the SD in ~1:10, and then nothing
for the next 50 minutes. The log also has this:
Start time: 06-Jan-2008 02:02:52
End time: 06-Jan-2008 04:02:52
Elapsed time: 2 hours
Priority: 10
FD Files Written: 0
SD Files Written: 5,705
FD Bytes Written: 0 (0 B)
SD Bytes Written: 620,714,877 (620.7 MB)
Other (smaller) jobs from the same client work. This job used to work.
All parties are running 1.38.11 on Debian.
The router (running dd-wrt) has a TCP timeout of 3600 seconds, but with
the heartbeat, that shouldn't matter, and in any case doesn't match the
observed times.
Any suggestions?
Regards,
Steve
--
Steve Greenland
The irony is that Bill Gates claims to be making a stable operating
system and Linus Torvalds claims to be trying to take over the
world. -- seen on the net
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bacula-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-users