Arno Lehmann schrieb: > > while looking for my verify problems, I noticed that my full backup had > > errors > > at the end of the first tape but terminated with status OK. Therefor I > > noticed > > this error very late. > > But you actually kept and read the system log! Congratulations ;-)
Yeah, but only after opening a bug report regarding the verify errors. > > /var/log/kern.log > > Aug 5 06:00:29 VU0EM005 kernel: st0: Current: sense key: Aborted Command > > Aug 5 06:00:29 VU0EM005 kernel: Additional sense: Data phase error > > > > > > 05-Aug 02:13 VU0EM005: Start Backup JobId 416, > > Job=VU0EM003.2007-08-05_00.05.01 > > 05-Aug 02:13 VU0EM005: Spooling data ... > > VU0EM003: /home is a different filesystem. Will not descend from / > > into /home > > VU0EM003: /public is a different filesystem. Will not descend from / > > into /public > > VU0EM003: /lib/init/rw is a different filesystem. Will not descend > > from / into /lib/init/rw > > VU0EM003: /server is a different filesystem. Will not descend from / > > into /server > > 05-Aug 04:44 VU0EM005: User specified spool size reached. > > 05-Aug 04:44 VU0EM005: Writing spooled data to Volume. Despooling > > 450,971,582,471 bytes ... > > 05-Aug 06:00 VU0EM005: VU0EM003.2007-08-05_00.05.01 Error: block.c:569 > > Write error at 405:11595 on device "LTO3" (/dev/ULTRIUM-TD3). > > ERR=Eingabe-/Ausgabefehler. > > 05-Aug 06:00 VU0EM005: VU0EM003.2007-08-05_00.05.01 Error: block.c:317 > > Volume data error at 405:4294967295! > > Block checksum mismatch in block=5235094 len=64512: calc=2a90e8e7 > > blk=20b379e5 > > 05-Aug 06:00 VU0EM005: VU0EM003.2007-08-05_00.05.01 Error: Re-read last > > block at EOT failed. ERR=block.c:317 Volume data error at 405:4294967295! > > Block checksum mismatch in block=5235094 len=64512: calc=2a90e8e7 > > blk=20b379e5 > > 05-Aug 06:00 VU0EM005: End of medium on Volume "06D128L3" > > Bytes=404,723,386,368 Blocks=6,273,613 at 05-Aug-2007 06:00. > > Ok, this is unverified by me, but I *believe* that in these cases > Bacul does the right thing, i.e. it writes the block in question to > the next tape again. > > At least I hope this is the point of doing the re-read of the block > where an error occured... > > And it would explain why this is not an error from the job's point of > view. Hm, I still *hope* something was wrong with the backup, because if not, I don't know why the verify and restore jobs both failed. > > 05-Aug 06:02 VU0EM005: Using Volume "06D138L3" from 'Scratch' pool. > > 05-Aug 06:02 VU0EM005: 3307 Issuing autochanger "unload slot 9, drive 0" > > command. > > 05-Aug 06:03 VU0EM005: 3304 Issuing autochanger "load slot 19, drive 0" > > command. > > 05-Aug 06:03 VU0EM005: 3305 Autochanger "load slot 19, drive 0", status is > > OK. > > 05-Aug 06:03 VU0EM005: 3301 Issuing autochanger "loaded? drive 0" command. > > 05-Aug 06:03 VU0EM005: 3302 Autochanger "loaded? drive 0", result is Slot > > 19. > > 05-Aug 06:03 VU0EM005: Recycled volume "06D138L3" on device "LTO3" > > (/dev/ULTRIUM-TD3), all previous data lost. > > 05-Aug 06:03 VU0EM005: New volume "06D138L3" mounted on device "LTO3" > > (/dev/ULTRIUM-TD3) at 05-Aug-2007 06:03. > > 05-Aug 06:31 VU0EM005: Despooling elapsed time = 01:44:20, Transfer rate = > > 72.04 M bytes/second > > 05-Aug 06:42 VU0EM005: Spooling data again ... > > 05-Aug 08:12 VU0EM005: Job write elapsed time = 05:55:07, Transfer rate = > > 33.16 M bytes/second > > 05-Aug 08:12 VU0EM005: Committing spooled data to Volume "06D138L3". > > Despooling 256,418,397,693 bytes ... > > 05-Aug 09:13 VU0EM005: Despooling elapsed time = 01:01:30, Transfer rate = > > 69.49 M bytes/second > > 05-Aug 09:20 VU0EM005: Alert: smartctl version 5.36 > > [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce Allen > > 05-Aug 09:20 VU0EM005: Alert: Home page is > > http://smartmontools.sourceforge.net/ > > 05-Aug 09:20 VU0EM005: Alert: > > 05-Aug 09:20 VU0EM005: Alert: TapeAlert: OK > > 05-Aug 09:20 VU0EM005: Alert: > > 05-Aug 09:20 VU0EM005: Alert: Error Counter logging not supported > > 05-Aug 09:20 VU0EM005: Alert: > > 05-Aug 09:20 VU0EM005: Alert: Non-medium error count: 0 > > 05-Aug 09:20 VU0EM005: Sending spooled attrs to the Director. Despooling > > 600,205,128 bytes ... > > 05-Aug 09:37 VU0EM005: Bacula 2.0.3 (06Mar07): 05-Aug-2007 09:37:43 > > 05-Aug 09:37 VU0EM005: Bacula 2.0.3 (06Mar07): 05-Aug-2007 09:37:43 > > JobId: 416 > > Job: VU0EM003.2007-08-05_00.05.01 > > Backup Level: Full > > Client: "VU0EM003" 2.0.2 (28Jan07) > > x86_64-unknown-linux-gnu,debian,4.0 > > FileSet: "VU0EM003" 2007-06-12 00:05:01 > > Pool: "Full" (From Run pool override) > > Storage: "NEC-T40A" (From Job resource) > > Scheduled time: 05-Aug-2007 00:05:00 > > Start time: 05-Aug-2007 02:13:49 > > End time: 05-Aug-2007 09:37:43 > > Elapsed time: 7 hours 23 mins 54 secs > > Priority: 10 > > FD Files Written: 1,631,258 > > SD Files Written: 1,631,258 > > FD Bytes Written: 706,308,678,026 (706.3 GB) > > SD Bytes Written: 706,666,406,084 (706.6 GB) > > Rate: 26519.1 KB/s > > Software Compression: None > > VSS: no > > Encryption: no > > Volume name(s): 06D128L3|06D138L3 > > Volume Session Id: 106 > > Volume Session Time: 1184053226 > > Last Volume Bytes: 369,532,025,856 (369.5 GB) > > Non-fatal FD errors: 0 > > SD Errors: 0 > > FD termination status: OK > > SD termination status: OK > > Termination: Backup OK > > > > > > > > I've rerun the btape test (test/fill). No problems. At the moment I'm > > running > > the full backup to the same tapes again. The end of the first tape was > > detected > > without Block checksum mismatch. > > You could have tried a restore of the job with the (possible) error... I did. First I restored only a few diretories, without errors (70 GB). I was still thinking the verify job had problems, not the real backup. There were many different errors with the verify jobs. But the last verify job Itried also had a block mismatch at the end of tape #1. 07-Aug 19:54 VU0EM005: End of file 405 on device "LTO3" (/dev/ULTRIUM-TD3), Volume "06D128L3" 07-Aug 19:54 VU0EM005: VerifyVU0EM003.2007-08-07_18.18.29 Error: block.c:317 Volume data error at 405:11594! Then I did a full restore. This time I got an completely different error at a different volfile. 07-Aug 21:56 VU0EM005: End of file 203 on device "LTO3" (/dev/ULTRIUM-TD3), Volume "06D128L3" 07-Aug 22:12 VU0EM005: RestoreFiles.2007-08-07_20.56.07 Fatal error: read.c:139 Error sending to File daemon. ERR=Die Wartezeit für die Verbindung ist abgelaufen 07-Aug 22:12 VU0EM005: RestoreFiles.2007-08-07_20.56.07 Error: bnet.c:439 Write error sending 65536 bytes to client:10.60.1.252:36643: ERR=Die Wartezeit für die Verbindung ist abgelaufen 07-Aug 22:35 VU0EM003: RestoreFiles.2007-08-07_20.56.07 Fatal error: restore.c:252 Data record error. ERR=Unterbrechung während des Betriebssystemaufrufs > Regarding the tape write/read error, I believe these happen from time > to time, especially when related to end of tape detection (400GB on > LTO3 seems like the tape was actually full, but for some reason the > drive didn't report the approaching EOT to the OS). I would worry if > this happened regularly, but if it's a one-time glitch, I think > there's no problem. That was the first backup with two tapes since I tested the drive 6 months ago... Ralf ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Bacula-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/bacula-users
